{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T12:59:29Z","timestamp":1740142769542,"version":"3.37.3"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2022,8,9]],"date-time":"2022-08-09T00:00:00Z","timestamp":1660003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,14]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The sketch is one of the typical and widely used data structures for estimating the frequencies of items in data streams. However, since the counter sizes in traditional rectangular sketch (r-sketch) are the same, it is hard to achieve small space usage, high capacity (i.e. the maximum frequency can be recorded) and high estimated accuracy simultaneously. Moreover, when considering the high skewness of data streams, this problem will become even worse. Consequently, we propose the trapezoidal sketch (t-sketch) in this paper. In the t-sketch, different from the r-sketch, the counter sizes in different layers are different. Therefore, the low space usage and high capacity can be achieved simultaneously in the t-sketch. Moreover, based on the basic t-sketch, we propose the space-saving t-sketch and the capacity-improvement t-sketch and analyze the properties of these two t-sketches. Finally, for improving the estimation accuracy of the t-sketch further, we propose the probabilistic-based estimation error-reducing algorithm. Compared with the CM sketch, CU sketch, C sketch and A sketch, the simulation results show that the performances on space usage, capacity and estimation accuracy are improved successfully by the space-saving t-sketch and the capacity-improvement t-sketch.<\/jats:p>","DOI":"10.1093\/comjnl\/bxac111","type":"journal-article","created":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T09:28:55Z","timestamp":1660210135000},"page":"2656-2673","source":"Crossref","is-referenced-by-count":0,"title":["Trapezoidal Sketch: A Sketch Structure for Frequency Estimation of Data Streams"],"prefix":"10.1093","volume":"66","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8567-4025","authenticated-orcid":false,"given":"Ning","family":"Li","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology , Weihai 264209, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6616-9640","authenticated-orcid":false,"given":"Xin","family":"Yuan","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology , Weihai 264209, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jose-Fernan Martinez","family":"Ortega","sequence":"additional","affiliation":[{"name":"Universidad Polit\u00e9cnica de Madrid , Madrid 28031, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vicente Hernandez","family":"Diaz","sequence":"additional","affiliation":[{"name":"Universidad Polit\u00e9cnica de Madrid , Madrid 28031, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,8,9]]},"reference":[{"key":"2023111618464478700_ref1","doi-asserted-by":"crossref","first-page":"1622","DOI":"10.1109\/TNET.2012.2192447","article-title":"Per-flow traffic measurement through randomized counter sharing","volume":"20","author":"Li","year":"2012","journal-title":"IEEE\/ACM Tran.s Netw."},{"key":"2023111618464478700_ref2","first-page":"35","volume-title":"Proc. ACM SIGMOD","author":"Cormode","year":"2004"},{"key":"2023111618464478700_ref3","first-page":"1449","volume-title":"Proc. ACM SIGMOD","author":"Roy","year":"2016"},{"volume-title":"Sketch Techniques for Approximate Query Processing. Foundations and Trends in Databases","year":"2011","author":"Cormode","key":"2023111618464478700_ref4"},{"key":"2023111618464478700_ref5","first-page":"802","volume-title":"Proc. SDM, Ohio, USA","author":"Aggarwal","year":"2010"},{"key":"2023111618464478700_ref6","first-page":"1","volume-title":"Proc. IEEE ICDE","author":"6D. Thomas\/\/\/","year":"2009"},{"key":"2023111618464478700_ref7","first-page":"1","volume-title":"Proc. ACM STOC","author":"Gilbert","year":"2007"},{"key":"2023111618464478700_ref8","first-page":"468","volume-title":"EMNLP-CoNLL","author":"Talbot","year":"2007"},{"key":"2023111618464478700_ref9","first-page":"1574","volume-title":"IJCAI","author":"Durme","year":"2009"},{"key":"2023111618464478700_ref10","first-page":"1","volume-title":"Proc. ACM SIGMOD","author":"Polyzotis","year":"2004"},{"key":"2023111618464478700_ref11","first-page":"1","volume-title":"Proc. ACM SIGMOD","author":"Spiegel","year":"2006"},{"key":"2023111618464478700_ref12","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1007\/s10618-010-0185-7","article-title":"Mining top-k frequent itemsets through progressive sampling","volume":"21","author":"Pietracaprina","year":"2010","journal-title":"Data Min. Knowl. Discov."},{"key":"2023111618464478700_ref13","first-page":"1","volume-title":"Proc. IEEE ICDE","author":"Yang","year":"2017"},{"key":"2023111618464478700_ref14","first-page":"1","volume-title":"Proc. IEEE Globecom","author":"Zhou","year":"2017"},{"first-page":"1","year":"2017","author":"Yang","key":"2023111618464478700_ref15"},{"key":"2023111618464478700_ref16","first-page":"1","volume-title":"Proc. ACM SIGCOMM","author":"Yang","year":"2018"},{"key":"2023111618464478700_ref17","first-page":"408","volume-title":"Prof. VLDB","author":"Yang","year":"2016"},{"key":"2023111618464478700_ref18","doi-asserted-by":"crossref","first-page":"3116","DOI":"10.1109\/TNET.2017.2730227","article-title":"A shifting framework for set queries","volume":"25","author":"Yang","year":"2017","journal-title":"IEEE\/ACM Trans. Netw."},{"key":"2023111618464478700_ref19","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.jalgor.2003.12.001","article-title":"An improved data stream summary: the count-min sketch and its applications","volume":"55","author":"Cormode","year":"2005","journal-title":"J. Algorithms"},{"key":"2023111618464478700_ref20","first-page":"270","article-title":"New directions in traffic measurement and accounting","volume":"32","author":"Estan","year":"2002","journal-title":"ACM SIGMCOMM CCR"},{"key":"2023111618464478700_ref21","first-page":"1530","volume-title":"Proc. VLDB","author":"Cormode","year":"2008"},{"key":"2023111618464478700_ref22","first-page":"1","volume-title":"Proc. ACM SIGCOMM","author":"Huang","year":"2108"},{"key":"2023111618464478700_ref23","first-page":"1027","volume-title":"Proc. USENIX NSDI","author":"Huang","year":"2021"},{"key":"2023111618464478700_ref24","first-page":"334","volume-title":"Proc. ACM SIGCOMM","author":"Liu","year":"2019"},{"key":"2023111618464478700_ref25","first-page":"991","volume-title":"Proc. USENIX NSDI","author":"Zhao","year":"2021"},{"key":"2023111618464478700_ref26","first-page":"113","volume-title":"Proc. ACM SIGCOMM","author":"Huang","year":"2017"},{"key":"2023111618464478700_ref27","first-page":"207","volume-title":"Proc. ACM SIGCOMM","author":"Zhang","year":"2021"},{"key":"2023111618464478700_ref28","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1109\/90.851975","article-title":"Summary cache: a scalable wide-area web cache sharing protocol","volume":"8","author":"Fan","year":"2000","journal-title":"IEEE\/ACM Trans. Netw."},{"key":"2023111618464478700_ref29","first-page":"241","volume-title":"Proc. ACM SIGMOD","author":"Cohen","year":"2003"},{"key":"2023111618464478700_ref30","first-page":"1","volume-title":"Proc. ACM SIGMOD","author":"Aguilar-Saborit","year":"2006"},{"author":"Real-life transactional dataset","key":"2023111618464478700_ref31"},{"key":"2023111618464478700_ref32","doi-asserted-by":"crossref","first-page":"151","DOI":"10.3115\/1603899.1603924","volume-title":"Proc. NeMLaP3-CoNLL","author":"Powers","year":"1998"},{"key":"2023111618464478700_ref33","first-page":"187","article-title":"High-performance benchmarking with web polygraph","volume":"34","author":"Rousskov","year":"2004","journal-title":"Software"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/66\/11\/2656\/53401098\/bxac111.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/66\/11\/2656\/53401098\/bxac111.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T18:47:33Z","timestamp":1700160453000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/66\/11\/2656\/6658859"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,9]]},"references-count":33,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,8,9]]},"published-print":{"date-parts":[[2023,11,14]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxac111","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2023,11]]},"published":{"date-parts":[[2022,8,9]]}}}