{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:54:34Z","timestamp":1775638474517,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key-Area Research and Development Program of Guangdong Province","award":["2020B0101390001"],"award-info":[{"award-number":["2020B0101390001"]}]},{"name":"ZTE Industry-University-Institute Cooperation Funds","award":["No. HC-CN-20220705007"],"award-info":[{"award-number":["No. HC-CN-20220705007"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["No. U20A20179, No. 61872011"],"award-info":[{"award-number":["No. U20A20179, No. 61872011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>Inner-product estimation is the base of many important tasks in a variety of big data scenarios, including measuring similarity of streams in data stream processing, estimating join size in database, and analyzing cosine similarity in various applications. Sketch, as a class of probability algorithms, is promising in inner-product estimation. However, existing sketch solutions suffer from low accuracy due to their neglect of the high skewness of real data. In this paper, we design a new sketch algorithm for accurate and unbiased inner-product estimation, namely JoinSketch. To improve accuracy, JoinSketch consists of multiple components, and records items with different frequency in different components. We theoretically prove that JoinSketch is unbiased, and has lower variance compared with the well-known AGMS and Fast-AGMS sketch. The experimental results show that JoinSketch improves the accuracy by 10 times in average while maintaining a comparable speed. All code is open-sourced at Github.<\/jats:p>","DOI":"10.1145\/3588935","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-26","source":"Crossref","is-referenced-by-count":16,"title":["JoinSketch: A Sketch Algorithm for Accurate and Unbiased Inner-Product Estimation"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-4322-2905","authenticated-orcid":false,"given":"Feiyu","family":"Wang","sequence":"first","affiliation":[{"name":"Peking University &amp; Pengcheng Laboratory, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4020-6772","authenticated-orcid":false,"given":"Qizhi","family":"Chen","sequence":"additional","affiliation":[{"name":"Peking University &amp; Pengcheng Laboratory, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7248-1312","authenticated-orcid":false,"given":"Yuanpeng","family":"Li","sequence":"additional","affiliation":[{"name":"Peking University &amp; Pengcheng Laboratory, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2402-5854","authenticated-orcid":false,"given":"Tong","family":"Yang","sequence":"additional","affiliation":[{"name":"Peking University &amp; Pengcheng Laboratory, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2616-2273","authenticated-orcid":false,"given":"Yaofeng","family":"Tu","sequence":"additional","affiliation":[{"name":"ZTE Corporation, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7232-2549","authenticated-orcid":false,"given":"Lian","family":"Yu","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1681-4677","authenticated-orcid":false,"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2016. Murmur Hashing source codes. https:\/\/github.com\/aappleby\/smhasher\/blob\/master\/src\/MurmurHash3.cpp."},{"key":"e_1_2_2_2_1","unstructured":"2023. Related Source code. https:\/\/github.com\/JoinSketch\/JoinSketch."},{"key":"e_1_2_2_3_1","volume-title":"Power-law distribution of the world wide web. science 287, 5461","author":"Adamic Lada A","year":"2000","unstructured":"Lada A Adamic and Bernardo A Huberman. 2000. Power-law distribution of the world wide web. science 287, 5461 (2000), 2115--2115."},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.2001.1813"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1545"},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 234--247","author":"Balachander K.","unstructured":"K. Balachander, S. Subhabrata, Z. Yin, and C. Yan. 2003. Sketch-based change detection: methods, evaluation, and applications. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 234--247."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00080"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2016.7524364"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319894"},{"key":"e_1_2_2_10_1","volume-title":"The CAIDA UCSD Anonymized Internet Traces","author":"CAIDA.","year":"2018","unstructured":"CAIDA. 2018. The CAIDA UCSD Anonymized Internet Traces 2018. https:\/\/www.caida.org\/catalog\/datasets\/passive_dataset\/."},{"key":"e_1_2_2_11_1","volume-title":"Automata, Languages and Programming","author":"Charikar Moses","unstructured":"Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In Automata, Languages and Programming. Springer."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452784"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3487552.3487856"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750545"},{"key":"e_1_2_2_15_1","volume-title":"Proceedings of the 31st international conference on Very large data bases. 13--24","author":"Cormode Graham","year":"2005","unstructured":"Graham Cormode and Minos Garofalakis. 2005. Sketching streams through the net: Distributed approximate query tracking. In Proceedings of the 31st international conference on Very large data bases. 13--24."},{"key":"e_1_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Graham Cormode Minos Garofalakis Peter J Haas Chris Jermaine et al. 2011. Synopses for massive data: Samples histograms wavelets sketches. Foundations and Trends\u00ae in Databases 4 1--3 (2011) 1--294.","DOI":"10.1561\/1900000004"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_2_2_18_1","volume-title":"Degree sequence bound for join cardinality estimation. arXiv preprint arXiv:2201.04166","author":"Deeds Kyle","year":"2022","unstructured":"Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai. 2022. Degree sequence bound for join cardinality estimation. arXiv preprint arXiv:2201.04166 (2022)."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564699"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24741-8_32"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.61"},{"key":"e_1_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Philippe Flajolet \u00c9ric Fusy Olivier Gandouet and Fr\u00e9d\u00e9ric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science 137--156.","DOI":"10.46298\/dmtcs.3545"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24741-8_33"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/233269.233340"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/11590156_24"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/153850.153875"},{"key":"e_1_2_2_27_1","doi-asserted-by":"crossref","unstructured":"F. M. Harper and J. A. Konstan. 2015. The MovieLens Datasets. ACM Transactions on Interactive Intelligent Systems (TiiS) (2015).","DOI":"10.1145\/2827872"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/115790.115835"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/169725.169708"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/568271.223841"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452840"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806416.2806515"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-017-0480-7"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3523210.3523220"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2021.3120085"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341302.3342076"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-155860869-6\/50038-X"},{"key":"e_1_2_2_39_1","volume-title":"International Conference on Database Theory. Springer.","author":"Metwally Ahmed","year":"2005","unstructured":"Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In International Conference on Database Theory. Springer."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/867576"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","unstructured":"Meikel Poess. 2018. TPC-DS. Springer International Publishing Cham 1--8. https:\/\/doi.org\/10.1007\/978--3--319--63962--8_127--1","DOI":"10.1007\/978--3--319--63962--8_127--1"},{"key":"e_1_2_2_42_1","volume-title":"VLDB","volume":"97","author":"Poosala Viswanath","year":"1997","unstructured":"Viswanath Poosala and Yannis E Ioannidis. 1997. Selectivity estimation without the attribute value independence assumption. In VLDB, Vol. 97. Citeseer, 486--495."},{"key":"e_1_2_2_43_1","doi-asserted-by":"crossref","unstructured":"David MW Powers. 1998. Applications and explanations of Zipf's law. In New methods in language processing and computational natural language learning.","DOI":"10.3115\/1603899.1603924"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882948"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247503"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1386118.1386121"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2007.896150"},{"key":"e_1_2_2_48_1","first-page":"18","article-title":"Automating QoS and QoE Evaluation of HTTP Adaptive Streaming Systems","volume":"17","author":"Timmerer Christian","year":"2019","unstructured":"Christian Timmerer and Anatoliy Zabrovskiy. 2019. Automating QoS and QoE Evaluation of HTTP Adaptive Streaming Systems. ZTE Communications 17, 1 (2019), 18--24.","journal-title":"ZTE Communications"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183759"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824051"},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"S. Venkataraman D. Xiaodong Song P. B. Gibbons and A. Blum. 2005. New Streaming Algorithms for Fast Detection of Superspreaders. In NDSS.","DOI":"10.21236\/ADA461026"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00035"},{"key":"e_1_2_2_53_1","volume-title":"SketchINT: Empowering INT with TowerSketch for Per-flow Per-switch Measurement. In 2021 IEEE 29th International Conference on Network Protocols (ICNP). IEEE, 1--12","author":"Yang Kaicheng","year":"2021","unstructured":"Kaicheng Yang, Yuanpeng Li, Zirui Liu, Tong Yang, Yu Zhou, Jintao He, Tong Zhao, Zhengyi Jia, Yongqiang Yang, et al. 2021. SketchINT: Empowering INT with TowerSketch for Per-flow Per-switch Measurement. In 2021 IEEE 29th International Conference on Network Protocols (ICNP). IEEE, 1--12."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230544"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463701"},{"key":"e_1_2_2_56_1","first-page":"85","article-title":"Approach to Anomaly Detection in Microservice System with Multi-Source Data Streams","volume":"20","author":"Qixun ZHANG","year":"2022","unstructured":"Qixun ZHANG, Jing HAN, Li CHENG, Baisheng ZHANG, and Zican GONG. 2022. Approach to Anomaly Detection in Microservice System with Multi-Source Data Streams. ZTE Communications 20, 3 (2022), 85--92.","journal-title":"ZTE Communications"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425884"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467353"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452775"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183726"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588935","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588935","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:37Z","timestamp":1750178857000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588935"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588935"],"URL":"https:\/\/doi.org\/10.1145\/3588935","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}