{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T02:22:03Z","timestamp":1773886923106,"version":"3.50.1"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2008,8]]},"abstract":"<jats:p>The frequent items problem is to process a stream of items and find all items occurring more than a given fraction of the time. It is one of the most heavily studied problems in data stream mining, dating back to the 1980s. Many applications rely directly or indirectly on finding the frequent items, and implementations are in use in large scale industrial systems. However, there has not been much comparison of the different methods under uniform experimental conditions. It is common to find papers touching on this topic in which important related work is mischaracterized, overlooked, or reinvented.<\/jats:p>\n          <jats:p>In this paper, we aim to present the most important algorithms for this problem in a common framework. We have created baseline implementations of the algorithms, and used these to perform a thorough experimental study of their properties. We give empirical evidence that there is considerable variation in the performance of frequent items algorithms. The best methods can be implemented to find frequent items with high accuracy using only tens of kilobytes of memory, at rates of millions of items per second on cheap modern hardware.<\/jats:p>","DOI":"10.14778\/1454159.1454225","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1530-1541","source":"Crossref","is-referenced-by-count":200,"title":["Finding frequent items in data streams"],"prefix":"10.14778","volume":"1","author":[{"given":"Graham","family":"Cormode","sequence":"first","affiliation":[{"name":"AT&amp;T Labs-Research, Florham Park, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marios","family":"Hadjieleftheriou","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research, Florham Park, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2008,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/237814.237823"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1055558.1055598"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247510"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2007.4401052"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1109557.1109634"},{"key":"e_1_2_1_7_1","volume-title":"SIROCCO","author":"Bose P.","year":"2003","unstructured":"P. Bose , E. Kranakis , P. Morin , and Y. Tang . Bounds for frequency estimation of packet streams . In SIROCCO , 2003 . P. Bose, E. Kranakis, P. Morin, and Y. Tang. Bounds for frequency estimation of packet streams. In SIROCCO, 2003."},{"key":"e_1_2_1_9_1","series-title":"Automated Reasoning Series","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1007\/978-94-011-3488-0_5","volume-title":"Automated Reasoning: Essays in Honor of Woody Bledsoe","author":"Boyer R. S.","year":"1991","unstructured":"R. S. Boyer and J. S. Moore . MJRTY - a fast majority vote algorithm . In Automated Reasoning: Essays in Honor of Woody Bledsoe , Automated Reasoning Series , pages 105 -- 117 . Kluwer Academic Publishers , 1991 . R. S. Boyer and J. S. Moore. MJRTY - a fast majority vote algorithm. In Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning Series, pages 105--117. Kluwer Academic Publishers, 1991."},{"key":"e_1_2_1_10_1","volume-title":"ACM-SIAM Symposium on Discrete Algorithms","author":"Chakrabarti A.","year":"2007","unstructured":"A. Chakrabarti , G. Cormode , and A. McGregor . A near-optimal algorithm for computing the entropy of a stream . In ACM-SIAM Symposium on Discrete Algorithms , 2007 . A. Chakrabarti, G. Cormode, and A. McGregor. A near-optimal algorithm for computing the entropy of a stream. In ACM-SIAM Symposium on Discrete Algorithms, 2007."},{"key":"e_1_2_1_11_1","volume-title":"Procedings of the International Colloquium on Automata","author":"Charikar M.","year":"2002","unstructured":"M. Charikar , K. Chen , and M. Farach-Colton . Finding frequent items in data streams . In Procedings of the International Colloquium on Automata , Languages and Programming (ICALP) , 2002 . M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Procedings of the International Colloquium on Automata, Languages and Programming (ICALP), 2002."},{"key":"e_1_2_1_12_1","unstructured":"G. Cormode and M. Hadjieleftheriou. Finding Frequent Items in Data Streams: Source Code. http:\/\/www.research.att.com\/~marioh\/frequent-items.html.  G. Cormode and M. Hadjieleftheriou. Finding Frequent Items in Data Streams: Source Code. http:\/\/www.research.att.com\/~marioh\/frequent-items.html."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007575"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142351.1142389"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497562"},{"key":"e_1_2_1_16_1","unstructured":"G. Cormode and S. Muthukrishnan. MassDAL Public Code Bank. http:\/\/www.cs.rutgers.edu\/~muthu\/massdal-code-index.html.  G. Cormode and S. Muthukrishnan. MassDAL Public Code Bank. http:\/\/www.cs.rutgers.edu\/~muthu\/massdal-code-index.html."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.2004.1354567"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/545381.545466"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/647912.740658"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247503"},{"issue":"4","key":"e_1_2_1_22_1","first-page":"376","article-title":"Finding a majority among n votes: Solution to problem 81-5","volume":"3","author":"Fischer M.","year":"1982","unstructured":"M. Fischer and S. Salzburg . Finding a majority among n votes: Solution to problem 81-5 . Journal of Algorithms , 3 ( 4 ): 376 -- 379 , 1982 . M. Fischer and S. Salzburg. Finding a majority among n votes: Solution to problem 81-5. Journal of Algorithms, 3(4):376--379, 1982.","journal-title":"Journal of Algorithms"},{"key":"e_1_2_1_23_1","first-page":"454","volume-title":"International Conference on Very Large Data Bases","author":"Gilbert A. C.","year":"2002","unstructured":"A. C. Gilbert , Y. Kotidis , S. Muthukrishnan , and M. Strauss . How to summarize the universe: Dynamic maintenance of quantiles . In International Conference on Very Large Data Bases , pages 454 -- 465 , 2002 . A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In International Conference on Very Large Data Bases, pages 454--465, 2002."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375670"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30551-4_46"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1265530.1265565"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/762471.762473"},{"key":"e_1_2_1_28_1","volume-title":"Robust aggregation in sensor networks","author":"Kollios G.","year":"2005","unstructured":"G. Kollios , J. Byers , J. Considine , M. Hadjieleftheriou , and F. Li . Robust aggregation in sensor networks . IEEE Data Engineering Bulletin , 28(1), Mar. 2005 . G. Kollios, J. Byers, J. Considine, M. Hadjieleftheriou, and F. Li. Robust aggregation in sensor networks. IEEE Data Engineering Bulletin, 28(1), Mar. 2005."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142351.1142393"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1287369.1287400"},{"key":"e_1_2_1_31_1","unstructured":"G. S. Manku. Frequency counts over data streams. http:\/\/www.cse.ust.hk\/vldb2002\/VLDB2002-proceedings\/slides\/S10P03slides.pdf 2002.   G. S. Manku. Frequency counts over data streams. http:\/\/www.cse.ust.hk\/vldb2002\/VLDB2002-proceedings\/slides\/S10P03slides.pdf 2002."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30570-5_27"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1353343.1353418"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-6423(82)90012-0"},{"issue":"4","key":"e_1_2_1_35_1","first-page":"277","article-title":"Interpreting the data: Parallel analysis with sawzall","volume":"13","author":"Pike R.","year":"2005","unstructured":"R. Pike , S. Dorward , R. Griesemer , and S. Quinlan . Interpreting the data: Parallel analysis with sawzall . Dynamic Grids and Worldwide Computing , 13 ( 4 ): 277 -- 298 , 2005 . R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with sawzall. Dynamic Grids and Worldwide Computing, 13(4):277--298, 2005.","journal-title":"Dynamic Grids and Worldwide Computing"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2007.896150"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1031495.1031524"},{"key":"e_1_2_1_38_1","volume-title":"ACM-SIAM Symposium on Discrete Algorithms","author":"Thorup M.","year":"2000","unstructured":"M. Thorup . Even strongly universal hashing is pretty fast . In ACM-SIAM Symposium on Discrete Algorithms , 2000 . M. Thorup. Even strongly universal hashing is pretty fast. In ACM-SIAM Symposium on Discrete Algorithms, 2000."},{"key":"e_1_2_1_39_1","volume-title":"ACM-SIAM Symposium on Discrete Algorithms","author":"Thorup M.","year":"2004","unstructured":"M. Thorup and Y. Zhang . Tabulation based 4-universal hashing with applications to second moment estimation . In ACM-SIAM Symposium on Discrete Algorithms , 2004 . M. Thorup and Y. Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In ACM-SIAM Symposium on Discrete Algorithms, 2004."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1454159.1454225","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:54:02Z","timestamp":1672221242000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1454159.1454225"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8]]},"references-count":37,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2008,8]]}},"alternative-id":["10.14778\/1454159.1454225"],"URL":"https:\/\/doi.org\/10.14778\/1454159.1454225","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2008,8]]}}}