{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T12:17:50Z","timestamp":1767183470605,"version":"3.41.0"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2008,1,1]],"date-time":"2008-01-01T00:00:00Z","timestamp":1199145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["02-05116","220280"],"award-info":[{"award-number":["02-05116","220280"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2008,1]]},"abstract":"<jats:p>\n            Data items that arrive online as streams typically have attributes which take values from one or more hierarchies (time and geographic location, source and destination IP addresses, etc.). Providing an aggregate view of such data is important for summarization, visualization, and analysis. We develop an aggregate view based on certain organized sets of large-valued regions (\u201cheavy hitters\u201d) corresponding to hierarchically discounted frequency counts. We formally define the notion of\n            <jats:italic>hierarchical heavy hitters<\/jats:italic>\n            (HHHs). We first consider computing (approximate) HHHs over a data stream drawn from a single hierarchical attribute. We formalize the problem and give deterministic algorithms to find them in a single pass over the input.\n          <\/jats:p>\n          <jats:p>In order to analyze a wider range of realistic data streams (e.g., from IP traffic-monitoring applications), we generalize this problem to multiple dimensions. Here, the semantics of HHHs are more complex, since a \u201cchild\u201d node can have multiple \u201cparent\u201d nodes. We present online algorithms that find approximate HHHs in one pass, with provable accuracy guarantees. The product of hierarchical dimensions forms a mathematical lattice structure. Our algorithms exploit this structure, and so are able to track approximate HHHs using only a small, fixed number of statistics per stored item, regardless of the number of dimensions.<\/jats:p>\n          <jats:p>We show experimentally, using real data, that our proposed algorithms yields outputs which are very similar (virtually identical, in many cases) to offline computations of the exact solutions, whereas straightforward heavy-hitters-based approaches give significantly inferior answer quality. Furthermore, the proposed algorithms result in an order of magnitude savings in data structure size while performing competitively.<\/jats:p>","DOI":"10.1145\/1324172.1324174","type":"journal-article","created":{"date-parts":[[2008,2,8]],"date-time":"2008-02-08T15:32:16Z","timestamp":1202484736000},"page":"1-48","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":63,"title":["Finding hierarchical heavy hitters in streaming data"],"prefix":"10.1145","volume":"1","author":[{"given":"Graham","family":"Cormode","sequence":"first","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Flip","family":"Korn","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"S.","family":"Muthukrishnan","sequence":"additional","affiliation":[{"name":"Rutgers University, Piscataway, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2008,2,2]]},"reference":[{"volume-title":"Proceedings of the International Conference on Very Large Data Bases.","author":"Agarwal S.","key":"e_1_2_1_1_1","unstructured":"Agarwal , S. , Agrawal , R. , Deshpande , P. , Gupta , A. , Naughton , J. F. , Ramakrishnan , R. , and Sarawagi , S . 1996. On the computation of multidimensional aggregates . In Proceedings of the International Conference on Very Large Data Bases. Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J. F., Ramakrishnan, R., and Sarawagi, S. 1996. On the computation of multidimensional aggregates. In Proceedings of the International Conference on Very Large Data Bases."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304214"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Charikar M. Chen K. and Farach-Colton M. 2002. Finding frequent items in data streams. In Procedings of the International Colloquium on Automata Languages and Programming (ICALP) 693--703.   Charikar M. Chen K. and Farach-Colton M. 2002. Finding frequent items in data streams. In Procedings of the International Colloquium on Automata Languages and Programming (ICALP) 693--703.","DOI":"10.1007\/3-540-45465-9_59"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007575"},{"volume-title":"Proceedings of the International Conference on Very Large Data Bases, 464--475","author":"Cormode G.","key":"e_1_2_1_5_1","unstructured":"Cormode , G. , Korn , F. , Muthukrishnan , S. , and Srivastava , D . 2003. Finding hierarchical heavy hitters in data streams . In Proceedings of the International Conference on Very Large Data Bases, 464--475 . Cormode, G., Korn, F., Muthukrishnan, S., and Srivastava, D. 2003. Finding hierarchical heavy hitters in data streams. In Proceedings of the International Conference on Very Large Data Bases, 464--475."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007588"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/773153.773182"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872838"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the European Symposium on Algorithms (ESA). Lecture Notes in Computer Science","volume":"2461","author":"Demaine E.","unstructured":"Demaine , E. , L\u00f3pez-Ortiz , A. , and Munro , J. I . 2002. Frequency estimation of internet packet streams with limited space . In Proceedings of the European Symposium on Algorithms (ESA). Lecture Notes in Computer Science , vol. 2461 . Springer, Berlin, 348--360. Demaine, E., L\u00f3pez-Ortiz, A., and Munro, J. I. 2002. Frequency estimation of internet packet streams with limited space. In Proceedings of the European Symposium on Algorithms (ESA). Lecture Notes in Computer Science, vol. 2461. Springer, Berlin, 348--360."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/863955.863972"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/635484.635487"},{"volume-title":"Proceedings of the International Conference on Very Large Data Bases, 454--465","author":"Gilbert A. C.","key":"e_1_2_1_13_1","unstructured":"Gilbert , A. C. , Kotidis , Y. , Muthukrishnan , S. , and Strauss , M . 2002. How to summarize the universe: Dynamic maintenance of quantiles . In Proceedings of the International Conference on Very Large Data Bases, 454--465 . Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. 2002. How to summarize the universe: Dynamic maintenance of quantiles. In Proceedings of the International Conference on Very Large Data Bases, 454--465."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/380752.380841"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065167.1065211"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/762471.762473"},{"volume-title":"Proceedings of the International Conference on Very Large Data Bases, 766--777","author":"Lakshmanan L. V. S.","key":"e_1_2_1_17_1","unstructured":"Lakshmanan , L. V. S. , Ng , R. T. , Wang , C. X. , Zhou , X. , and Johnson , T . 2002. The generalized MDL approach for summarization . In Proceedings of the International Conference on Very Large Data Bases, 766--777 . Lakshmanan, L. V. S., Ng, R. T., Wang, C. X., Zhou, X., and Johnson, T. 2002. The generalized MDL approach for summarization. In Proceedings of the International Conference on Very Large Data Bases, 766--777."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304200"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1053724.1054115"},{"volume-title":"Proceedings of the International Conference on Very Large Data Bases, 346--357","author":"Manku G.","key":"e_1_2_1_20_1","unstructured":"Manku , G. and Motwani , R . 2002. Approximate frequency counts over data streams . In Proceedings of the International Conference on Very Large Data Bases, 346--357 . Manku, G. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Data Bases, 346--357."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30570-5_27"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-6423(82)90012-0"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375666"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564741"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/288627.288645"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028788.1028802"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1324172.1324174","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1324172.1324174","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:56:15Z","timestamp":1750254975000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1324172.1324174"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,1]]},"references-count":26,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2008,1]]}},"alternative-id":["10.1145\/1324172.1324174"],"URL":"https:\/\/doi.org\/10.1145\/1324172.1324174","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2008,1]]},"assertion":[{"value":"2007-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2007-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-02-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}