{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T23:10:44Z","timestamp":1715037044048},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>\n            Many data sets occur as\n            <jats:italic>unaggregated data sets<\/jats:italic>\n            , where multiple data points are associated with each key. In the\n            <jats:italic>aggregate view<\/jats:italic>\n            of the data, the\n            <jats:italic>weight<\/jats:italic>\n            of a key is the sum of the weights of data points associated with the key. Examples are measurements of IP packet header streams, distributed data streams produced by events registered by sensor networks, and Web page or multimedia requests to context distribution servers. We aim to combine sampling and aggregation to provide accurate and efficient summaries of the aggregate view. However, data points are scattered in time or across multiple servers and hence aggregation is subject to resource constraints on the size of summaries that can be stored or transmitted.\n          <\/jats:p>\n          <jats:p>We develop a summarization framework for unaggregated data where summarization is a scalable and composable operator, and as such, can be tailored to meet resource constraints. Our summaries support unbiased estimates of the weight of subpopulations of keys specified using arbitrary selection predicates. While we prove that under such scenarios there is no variance optimal scheme, our estimators have the desirable properties that the variance is progressively closer to the minimum possible when applied to a \"more\" aggregated data set. An extensive evaluation using synthetic and real data sets shows that our summarization framework outperforms all existing schemes for this fundamental problem, even for the special and well-studied case of data streams.<\/jats:p>","DOI":"10.14778\/1687627.1687677","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"431-442","source":"Crossref","is-referenced-by-count":3,"title":["Composable, scalable, and accurate weight summarization of unaggregated data sets"],"prefix":"10.14778","volume":"2","author":[{"given":"Edith","family":"Cohen","sequence":"first","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}]},{"given":"Nick","family":"Duffield","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}]},{"given":"Haim","family":"Kaplan","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]},{"given":"Carsten","family":"Lund","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}]},{"given":"Mikkel","family":"Thorup","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/69.3.653"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304206"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1298306.1298344"},{"key":"e_1_2_1_4_1","volume-title":"Manuscript","author":"Cohen E.","year":"2007","unstructured":"E. Cohen , N. Duffield , H. Kaplan , C. Lund , and M. Thorup . Algorithms and estimators for accurate summarization of Internet traffic . Manuscript , 2007 . E. Cohen, N. Duffield, H. Kaplan, C. Lund, and M. Thorup. Algorithms and estimators for accurate summarization of Internet traffic. Manuscript, 2007."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1265530.1265566"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/1496770.1496906"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375457.1375471"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281100.1281133"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453884"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/948205.948228"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1314690.1314696"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2005.846400"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/633025.633056"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1962.10480667"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276334"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177700375"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1952.10483446"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066159"},{"key":"e_1_2_1_19_1","volume-title":"2: Seminumerical Algorithms","author":"Knuth D. E.","year":"1969","unstructured":"D. E. Knuth . The Art of Computer Programming , Vol. 2: Seminumerical Algorithms . Addison-Wesley , 1969 . D. E. Knuth. The Art of Computer Programming, Vol. 2: Seminumerical Algorithms. Addison-Wesley, 1969."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSECP.2003.1219056"},{"key":"e_1_2_1_21_1","unstructured":"The Netflix Prize. http:\/\/www.netflixprize.com\/.  The Netflix Prize. http:\/\/www.netflixprize.com\/."},{"key":"e_1_2_1_22_1","unstructured":"Cisco NetFlow. http:\/\/www.cisco.com\/warp\/public\/732\/Tech\/netflow.  Cisco NetFlow. http:\/\/www.cisco.com\/warp\/public\/732\/Tech\/netflow."},{"issue":"2","key":"e_1_2_1_23_1","first-page":"149","article-title":"Sequential poisson sampling","volume":"14","author":"Ohlsson E.","year":"1998","unstructured":"E. Ohlsson . Sequential poisson sampling . J. Official Statistics , 14 ( 2 ): 149 -- 162 , 1998 . E. Ohlsson. Sequential poisson sampling. J. Official Statistics, 14(2):149--162, 1998.","journal-title":"J. Official Statistics"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177692620"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-3758(96)00185-1"},{"key":"e_1_2_1_26_1","volume-title":"CRC press","author":"Sampath S.","year":"2000","unstructured":"S. Sampath . Sampling Theory and Methods. CRC press , 2000 . S. Sampath. Sampling Theory and Methods. CRC press, 2000."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-017-1404-4"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.2307\/2346966"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1778580.1778591"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/83.1.238"},{"key":"e_1_2_1_31_1","volume-title":"Sampling Algorithms","author":"Till\u00e9 Y.","year":"2006","unstructured":"Y. Till\u00e9 . Sampling Algorithms . Springer-Verlag , New York , 2006 . Y. Till\u00e9. Sampling Algorithms. Springer-Verlag, New York, 2006."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3147.3165"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687627.1687677","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:28:16Z","timestamp":1672226896000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687627.1687677"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687627.1687677"],"URL":"https:\/\/doi.org\/10.14778\/1687627.1687677","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}