{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T04:41:11Z","timestamp":1773895271988,"version":"3.50.1"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key-Area Research and Development Program of Guangdong Province","award":["2020B0101390001"],"award-info":[{"award-number":["2020B0101390001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["No. U20A20179"],"award-info":[{"award-number":["No. U20A20179"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>Finding top-K frequent items has been a hot topic in data stream processing in recent years, which has a wide range of applications. However, most of existing sketch algorithms focuses on finding local top-K in a single data stream. In this paper, we work on finding global top-K in multiple disjoint data streams. We find that directly deploying prior sketch algorithms is often unfair under global scenarios, which will degrade the accuracy of global top-K. We define top-K-fairness and show that it is important for finding global top-K. To achieve top-K-fairness, we propose a new sketch framework, called the Double-Anonymous sketch. The process of finding global top-K items is similar to that of paper reviewing and democratic elections. In these scenarios, double-anonymity is often an effective strategy to achieve top-K-fairness. We also propose two techniques, hot panning, and early freezing, to further improve the accuracy. We theoretically prove that the Double-Anonymous sketch achieves top-K-fairnesswhile keeping high accuracy. We perform extensive experiments to verify top-K-fairness in the scenario of disjoint data streams. The experimental results show that the Double-Anonymous sketch's error is up to 129 times (60 times on average) smaller than the state-of-the-art. All the related source code is open-sourced and available at Github.<\/jats:p>","DOI":"10.1145\/3588933","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-26","source":"Crossref","is-referenced-by-count":20,"title":["Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent Items"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2495-7774","authenticated-orcid":false,"given":"Yikai","family":"Zhao","sequence":"first","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-8262-5658","authenticated-orcid":false,"given":"Wenchen","family":"Han","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5901-0858","authenticated-orcid":false,"given":"Zheng","family":"Zhong","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4060-287X","authenticated-orcid":false,"given":"Yinda","family":"Zhang","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2402-5854","authenticated-orcid":false,"given":"Tong","family":"Yang","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1681-4677","authenticated-orcid":false,"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2004. Real-Life Transactional Dataset. http:\/\/fimi.ua.ac.be\/data\/."},{"key":"e_1_2_2_2_1","volume-title":"Anonymized Internet Traces","year":"2016","unstructured":"2016. Anonymized Internet Traces 2016. https:\/\/catalog.caida.org\/dataset\/passive_2016_pcap."},{"key":"e_1_2_2_3_1","unstructured":"2023. Source code related to Double-Anonymous sketch. https:\/\/github.com\/Arimase97\/Double-Anonymous-Sketch."},{"key":"e_1_2_2_4_1","unstructured":"K. Balachander S. Subhabrata Z. Yin and C. Yan. 2003. Sketch-based change detection: methods evaluation and applications. In SIGCOMM."},{"key":"e_1_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Ran Ben-Basat Gil Einziger Roy Friedman and etal. 2017. Randomized admission policy for efficient top-k and frequency estimation. In INFOCOM.","DOI":"10.1109\/INFOCOM.2017.8057215"},{"key":"e_1_2_2_6_1","volume-title":"Jayasena","author":"Breslow Alex D.","year":"2018","unstructured":"Alex D. Breslow and Nuwan S. Jayasena. 2018. Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity. (2018)."},{"key":"e_1_2_2_7_1","volume-title":"Automata, Languages and Programming","author":"Charikar Moses","unstructured":"Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In Automata, Languages and Programming. Springer."},{"key":"e_1_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Peiqing Chen Dong Chen Lingxiao Zheng Jizhou Li and Tong Yang. 2021. Out of Many We are One: Measuring Item Batch with Clock-Sketch. SIGMOD.","DOI":"10.1145\/3448016.3452784"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3487552.3487856"},{"key":"e_1_2_2_10_1","volume-title":"An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms","author":"Cormode Graham","year":"2005","unstructured":"Graham Cormode and S Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms (2005)."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2005.860096"},{"key":"e_1_2_2_12_1","unstructured":"Haipeng Dai Muhammad Shahzad Alex X Liu and etal. 2016. Finding persistent items in data streams. VLDB Endowment (2016)."},{"key":"e_1_2_2_13_1","unstructured":"Zhenwei Dai Aditya Desai Reinhard Heckel and Anshumali Shrivastava. 2021. Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix. In SIGMOD."},{"key":"e_1_2_2_14_1","volume-title":"New estimation algorithms for streaming data: Count-min can do more. Webdocs. Cs. Ualberta. Ca","author":"Deng Fan","year":"2007","unstructured":"Fan Deng and Davood Rafiei. 2007. New estimation algorithms for streaming data: Count-min can do more. Webdocs. Cs. Ualberta. Ca (2007)."},{"key":"e_1_2_2_15_1","volume-title":"New directions in traffic measurement and accounting. SIGMCOMM","author":"Estan Cristian","year":"2002","unstructured":"Cristian Estan and George Varghese. 2002. New directions in traffic measurement and accounting. SIGMCOMM (2002)."},{"key":"e_1_2_2_16_1","unstructured":"Xiangyang Gou Long He Yinda Zhang and etal. 2020. Sliding sketches: A framework using time zones for data stream processing in sliding windows. In SIGKDD."},{"key":"e_1_2_2_17_1","volume-title":"Tharun Medini, Todd Treangen, and Anshumali Shrivastava.","author":"Gupta Gaurav","year":"2021","unstructured":"Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, RA Leo Elworth, Tharun Medini, Todd Treangen, and Anshumali Shrivastava. 2021. Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO). In SIGMOD."},{"key":"e_1_2_2_18_1","unstructured":"Z. Haida H. Zengfeng W. Zhewei and etal. 2017. Tracking Matrix Approximation over Distributed Sliding Windows. In ICDE."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452840"},{"key":"e_1_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Peng Jia Pinghui Wang Junzhou Zhao Shuo Zhang Yiyan Qi Min Hu Chao Deng and Xiaohong Guan. 2021. Bidirectionally Densifying LSH Sketches with Empty Bins. In SIGMOD.","DOI":"10.1145\/3448016.3452833"},{"key":"e_1_2_2_21_1","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data."},{"key":"e_1_2_2_22_1","unstructured":"Jizhou Li Zikun Li Yifei Xu and etal. 2020. WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams. In SIGKDD."},{"key":"e_1_2_2_23_1","unstructured":"Rundong Li Pinghui Wang Jiongli Zhu Junzhou Zhao Jia Di Xiaofei Yang and Kai Ye. 2021. Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying. In SIGMOD."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2012.2192447"},{"key":"e_1_2_2_25_1","unstructured":"Yuliang Li Rui Miao Changhoon Kim and etal. 2016. FlowRadar: a better NetFlow for data centers. In NSDI."},{"key":"e_1_2_2_26_1","volume-title":"Proc. ACM SIGCOMM.","author":"Liu Zaoxing","unstructured":"Zaoxing Liu, Antonis Manousis, and et al. 2016. One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon. In Proc. ACM SIGCOMM."},{"key":"e_1_2_2_27_1","unstructured":"G. Lukasz D. David D. Erik D L. Alejandro and M. J Ian. 2003. Identifying frequent items in sliding windows over on-line packet streams. In IMC."},{"key":"e_1_2_2_28_1","doi-asserted-by":"crossref","unstructured":"Ahmed Metwally Divyakant Agrawal and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In ICDT.","DOI":"10.1007\/978-3-540-30570-5_27"},{"key":"e_1_2_2_29_1","unstructured":"T. Nan C. Qing and M. Prasenjit. 2016. Graph stream summarization: From big bang to big crunch. In SIGMOD."},{"key":"e_1_2_2_30_1","doi-asserted-by":"crossref","unstructured":"Prashant Pandey Michael A. Bender Rob Johnson and Rob Patro. 2017. A General-Purpose Counting Filter: Making Every Bit Count. In SIGMOD.","DOI":"10.1145\/3035918.3035963"},{"key":"e_1_2_2_31_1","doi-asserted-by":"crossref","unstructured":"Prashant Pandey Alex Conway Joe Durie Michael A Bender Martin Farach-Colton and Rob Johnson. 2021. Vector Quotient Filters: Overcoming the Time\/Space Trade-Off in Filter Design. In SIGMOD.","DOI":"10.1145\/3448016.3452841"},{"key":"e_1_2_2_32_1","volume-title":"On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law internets. SIGCOMM computer communication review","author":"Park Kihong","year":"2001","unstructured":"Kihong Park and Heejo Lee. 2001. On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law internets. SIGCOMM computer communication review (2001)."},{"key":"e_1_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Yanqing Peng Jinwei Guo Feifei Li and etal. 2018. Persistent bloom filter: Membership testing for the entire history. In SIGMOD.","DOI":"10.1145\/3183713.3183737"},{"key":"e_1_2_2_34_1","doi-asserted-by":"crossref","unstructured":"David MW Powers. 1998. Applications and explanations of Zipf's law. In New methods in language processing and computational natural language learning.","DOI":"10.3115\/1603899.1603924"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882948"},{"key":"e_1_2_2_37_1","doi-asserted-by":"crossref","unstructured":"A\u00e9cio Santos Aline Bessa Fernando Chirigati Christopher Musco and Juliana Freire. 2021. Correlation sketches for approximate join-correlation queries. In SIGMOD.","DOI":"10.1145\/3448016.3458456"},{"key":"e_1_2_2_38_1","doi-asserted-by":"crossref","unstructured":"Robert Schweller Zhichun Li Yan Chen and etal. 2007. Reversible sketches: enabling monitoring and analysis over high-speed data streams. TON (2007).","DOI":"10.1109\/TNET.2007.896150"},{"key":"e_1_2_2_39_1","unstructured":"Benwei Shi Zhuoyue Zhao Yanqing Peng Feifei Li and Jeff M Phillips. 2021. At-the-time and Back-in-time Persistent Sketches. In SIGMOD."},{"key":"e_1_2_2_40_1","volume-title":"Arnd Christian Konig, and Mikhail Bilenko","author":"Shrivastava Anshumali","year":"2016","unstructured":"Anshumali Shrivastava, Arnd Christian Konig, and Mikhail Bilenko. 2016. Time adaptive sketches (ada-sketches) for summarizing data streams. In SIGMOD."},{"key":"e_1_2_2_41_1","unstructured":"M. Gurmeet Singh and M. Rajeev. 2002. Approximate frequency counts over data streams. In VLDB."},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/863955.863963"},{"key":"e_1_2_2_43_1","unstructured":"Kai Sheng Tai Vatsal Sharan Peter Bailis and etal. 2018. Sketching linear classifiers over data streams. In SIGMOD."},{"key":"e_1_2_2_44_1","doi-asserted-by":"crossref","unstructured":"Daniel Ting. 2018. Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions. In SIGKDD.","DOI":"10.1145\/3219819.3219975"},{"key":"e_1_2_2_45_1","volume-title":"Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation. In SIGMOD Conference.","author":"Ting Daniel","year":"2018","unstructured":"Daniel Ting. 2018. Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation. In SIGMOD Conference."},{"key":"e_1_2_2_46_1","doi-asserted-by":"crossref","unstructured":"Daniel Ting and Rick Cole. 2021. Conditional Cuckoo Filters. In SIGMOD.","DOI":"10.1145\/3448016.3452811"},{"key":"e_1_2_2_47_1","doi-asserted-by":"crossref","unstructured":"S. Venkataraman D. Xiaodong Song P. B. Gibbons and A. Blum. 2005. New Streaming Algorithms for Fast Detection of Superspreaders. In NDSS.","DOI":"10.21236\/ADA461026"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/3364324.3364333"},{"key":"e_1_2_2_49_1","doi-asserted-by":"crossref","unstructured":"Pinghui Wang Yiyan Qi Yuanming Zhang and etal. 2019. A memory-efficient sketch method for estimating high similarities in streaming sets. In SIGKDD.","DOI":"10.1145\/3292500.3330825"},{"key":"e_1_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Zhewei Wei Ge Luo Ke Yi and etal. 2015. Persistent data sketching. In SIGMOD.","DOI":"10.1145\/2723372.2749443"},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Tong Yang Junzhi Gong Haowei Zhang and etal. 2018. HeavyGuardian: Separate and Guard Hot Items in Data Streams. In SIGKDD.","DOI":"10.1145\/3219819.3219978"},{"key":"e_1_2_2_52_1","doi-asserted-by":"crossref","unstructured":"Tong Yang Jie Jiang Peng Liu and etal. 2018. Elastic sketch: adaptive and fast network-wide measurements. In SIGCOMM.","DOI":"10.1145\/3230543.3230544"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137652"},{"key":"e_1_2_2_54_1","volume-title":"Hao Zhang, Qiyan Li, and Yu Rong.","author":"Zhao Kangfei","year":"2021","unstructured":"Kangfei Zhao, Jeffrey Xu Yu, Hao Zhang, Qiyan Li, and Yu Rong. 2021. A Learned Sketch for Subgraph Counting. In SIGMOD."},{"key":"e_1_2_2_55_1","unstructured":"Yikai Zhao Kaicheng Yang Zirui Liu Tong Yang Li Chen Shiyi Liu Naiqian Zheng Ruixin Wang Hanbo Wu Yi Wang et al. 2021. LightGuardian: A Full-Visibility Lightweight In-band Telemetry System Using Sketchlets.. In NSDI. 991--1010."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00017"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467217"},{"key":"e_1_2_2_58_1","doi-asserted-by":"crossref","unstructured":"Zheng Zhong Shen Yan Zikun Li Decheng Tan Tong Yang and Bin Cui. 2021. BurstSketch: Finding Bursts in Data Streams. In SIGMOD.","DOI":"10.1145\/3448016.3452775"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588933","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588933","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:37Z","timestamp":1750178857000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588933"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":57,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588933"],"URL":"https:\/\/doi.org\/10.1145\/3588933","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}