{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:05:13Z","timestamp":1775815513240,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:p>Given a weighted set<jats:italic>S<\/jats:italic>of<jats:italic>n<\/jats:italic>elements,<jats:italic>weighted set sampling (WSS)<\/jats:italic>samples an element in<jats:italic>S<\/jats:italic>so that each element<jats:italic>a<jats:sub>i<\/jats:sub><\/jats:italic>; is sampled with a probability proportional to its weight<jats:italic>w<\/jats:italic>(<jats:italic>a<jats:sub>i<\/jats:sub><\/jats:italic>). The classic alias method pre-processes an index in<jats:italic>O<\/jats:italic>(<jats:italic>n<\/jats:italic>) time with<jats:italic>O<\/jats:italic>(<jats:italic>n<\/jats:italic>) space and handles WSS with<jats:italic>O<\/jats:italic>(1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected<jats:italic>O<\/jats:italic>(1) update time and draw<jats:italic>t<\/jats:italic>independent samples in expected<jats:italic>O<\/jats:italic>(<jats:italic>t<\/jats:italic>) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design<jats:italic>BUS<\/jats:italic>, an efficient scheme that handles an update in<jats:italic>O<\/jats:italic>(1) amortized time and draws<jats:italic>t<\/jats:italic>independent samples in<jats:italic>O<\/jats:italic>(log<jats:italic>n + t)<\/jats:italic>time with linear space.<\/jats:p><jats:p>A natural extension of WSS is the<jats:italic>weighted independent range sampling (WIRS)<\/jats:italic>, where each element in<jats:italic>S<\/jats:italic>is a data point from R. Given an arbitrary range<jats:italic>Q<\/jats:italic>= [\u2113,<jats:italic>r<\/jats:italic>] at query time, WIRS aims to do weighted set sampling on the set<jats:italic>S<jats:sub>Q<\/jats:sub><\/jats:italic>of data points falling into range<jats:italic>Q.<\/jats:italic>We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in<jats:italic>O<\/jats:italic>(log<jats:italic>n<\/jats:italic>) time and can draw<jats:italic>t<\/jats:italic>independent samples for WIRS in<jats:italic>O<\/jats:italic>(log<jats:italic>n + t<\/jats:italic>) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in<jats:italic>O<\/jats:italic>(log<jats:italic>n<\/jats:italic>) time and draws<jats:italic>t<\/jats:italic>independent samples in<jats:italic>O<\/jats:italic>(log<jats:sup>2<\/jats:sup><jats:italic>n + t<\/jats:italic>) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.<\/jats:p>","DOI":"10.14778\/3617838.3617840","type":"journal-article","created":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T17:07:41Z","timestamp":1701709661000},"page":"15-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Efficient Dynamic Weighted Set Sampling and Its Extension"],"prefix":"10.14778","volume":"17","author":[{"given":"Fangyuan","family":"Zhang","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR"}]},{"given":"Mengxu","family":"Jiang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR"}]},{"given":"Sibo","family":"Wang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR"}]}],"member":"320","published-online":{"date-parts":[[2023,12,4]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2010. Twitter. https:\/\/anlab-kaist.github.io\/traces\/. 2010. Twitter. https:\/\/anlab-kaist.github.io\/traces\/."},{"key":"e_1_2_1_2_1","unstructured":"2010. USA Road Networks. http:\/\/users.diag.uniroma1.it\/challenge9\/download.shtml. 2010. USA Road Networks. http:\/\/users.diag.uniroma1.it\/challenge9\/download.shtml."},{"key":"e_1_2_1_3_1","unstructured":"2013. Delicious. http:\/\/delicious.com\/. 2013. Delicious. http:\/\/delicious.com\/."},{"key":"e_1_2_1_4_1","volume-title":"Parallel Weighted Random Sampling","unstructured":"2019. Parallel Weighted Random Sampling . In ESA, Michael A. Bender, Ola Svensson, and Grzegorz Herman (Eds.), Vol. 144 . 59:1--59:24. 2019. Parallel Weighted Random Sampling. In ESA, Michael A. Bender, Ola Svensson, and Grzegorz Herman (Eds.), Vol. 144. 59:1--59:24."},{"key":"e_1_2_1_5_1","unstructured":"2023. Experiment code and technical report. https:\/\/github.com\/CUHK-DBGroup\/WSS-WIRS. 2023. Experiment code and technical report. https:\/\/github.com\/CUHK-DBGroup\/WSS-WIRS."},{"key":"e_1_2_1_6_1","first-page":"1","article-title":"Independent Range Sampling","volume":"129","author":"Afshani Peyman","year":"2019","unstructured":"Peyman Afshani and Jeff M. Phillips . 2019 . Independent Range Sampling , Revisited Again. In SoCG , Vol. 129. 4: 1 -- 4 :13. Peyman Afshani and Jeff M. Phillips. 2019. Independent Range Sampling, Revisited Again. In SoCG, Vol. 129. 4:1--4:13.","journal-title":"Revisited Again. In SoCG"},{"key":"e_1_2_1_7_1","first-page":"1","article-title":"Independent Range Sampling","volume":"87","author":"Afshani Peyman","year":"2017","unstructured":"Peyman Afshani and Zhewei Wei . 2017 . Independent Range Sampling , Revisited. In ESA , Vol. 87. 3: 1 -- 3 :14. Peyman Afshani and Zhewei Wei. 2017. Independent Range Sampling, Revisited. In ESA, Vol. 87. 3:1--3:14.","journal-title":"Revisited. In ESA"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Sameer Agarwal Barzan Mozafari Aurojit Panda Henry Milner Samuel Madden and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys. 29--42. Sameer Agarwal Barzan Mozafari Aurojit Panda Henry Milner Samuel Madden and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys. 29--42.","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3471485.3471496"},{"key":"e_1_2_1_10_1","article-title":"Sampling a Near Neighbor in High Dimensions - Who is the Fairest of Them All","volume":"47","author":"Aum\u00fcller Martin","year":"2022","unstructured":"Martin Aum\u00fcller , Sariel Har-Peled , Sepideh Mahabadi , Rasmus Pagh , and Francesco Silvestri . 2022 . Sampling a Near Neighbor in High Dimensions - Who is the Fairest of Them All ? ACM Trans. Database Syst. 47 , 1 (2022), 4:1--4:40. Martin Aum\u00fcller, Sariel Har-Peled, Sepideh Mahabadi, Rasmus Pagh, and Francesco Silvestri. 2022. Sampling a Near Neighbor in High Dimensions - Who is the Fairest of Them All? ACM Trans. Database Syst. 47, 1 (2022), 4:1--4:40.","journal-title":"ACM Trans. Database Syst."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Martin Aum\u00fcller Rasmus Pagh and Francesco Silvestri. 2020. Fair Near Neighbor Search: Independent Range Sampling in High Dimensions. In PODS. 191--204. Martin Aum\u00fcller Rasmus Pagh and Francesco Silvestri. 2020. Fair Near Neighbor Search: Independent Range Sampling in High Dimensions. In PODS. 191--204.","DOI":"10.1145\/3375395.3387648"},{"key":"e_1_2_1_12_1","first-page":"502","article-title":"Scalable Approximation Algorithm for Graph Summarization","volume":"10939","author":"Beg Maham Anwar","year":"2018","unstructured":"Maham Anwar Beg , Muhammad Ahmad , Arif Zaman , and Imdadullah Khan . 2018 . Scalable Approximation Algorithm for Graph Summarization . In PAKDD , Vol. 10939. 502 -- 514 . Maham Anwar Beg, Muhammad Ahmad, Arif Zaman, and Imdadullah Khan. 2018. Scalable Approximation Algorithm for Graph Summarization. In PAKDD, Vol. 10939. 502--514.","journal-title":"PAKDD"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Pawel Brach Alessandro Epasto Alessandro Panconesi and Piotr Sankowski. 2014. Spreading rumours without the network. In COSN. 107--118. Pawel Brach Alessandro Epasto Alessandro Panconesi and Piotr Sankowski. 2014. Spreading rumours without the network. In COSN. 107--118.","DOI":"10.1145\/2660460.2660472"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2015.07.007"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Karl Bringmann and Kasper Green Larsen. 2013. Succinct sampling from discrete distributions. In STOC. 775--782. Karl Bringmann and Kasper Green Larsen. 2013. Succinct sampling from discrete distributions. In STOC. 775--782.","DOI":"10.1145\/2488608.2488707"},{"key":"e_1_2_1_16_1","volume-title":"Narasayya","author":"Chaudhuri Surajit","year":"2001","unstructured":"Surajit Chaudhuri , Gautam Das , Mayur Datar , Rajeev Motwani , and Vivek R . Narasayya . 2001 . Overcoming Limitations of Sampling for Aggregation Queries. In ICDE. 534--542. Surajit Chaudhuri, Gautam Das, Mayur Datar, Rajeev Motwani, and Vivek R. Narasayya. 2001. Overcoming Limitations of Sampling for Aggregation Queries. In ICDE. 534--542."},{"key":"e_1_2_1_17_1","volume-title":"Narasayya","author":"Chaudhuri Surajit","year":"1998","unstructured":"Surajit Chaudhuri , Rajeev Motwani , and Vivek R . Narasayya . 1998 . Random Sampling for Histogram Construction: How much is enough?. In SIGMOD. 436--447. Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya. 1998. Random Sampling for Histogram Construction: How much is enough?. In SIGMOD. 436--447."},{"key":"e_1_2_1_18_1","volume-title":"Narasayya","author":"Chaudhuri Surajit","year":"1999","unstructured":"Surajit Chaudhuri , Rajeev Motwani , and Vivek R . Narasayya . 1999 . On Random Sampling over Joins. In SIGMOD. 263--274. Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya. 1999. On Random Sampling over Joins. In SIGMOD. 263--274."},{"key":"e_1_2_1_19_1","volume-title":"Daniel D Von Hoff, and Richard G Posner","author":"Colvin Joshua","year":"2010","unstructured":"Joshua Colvin , Michael I Monine , Ryan N Gutenkunst , William S Hlavacek , Daniel D Von Hoff, and Richard G Posner . 2010 . RuleMonkey: software for stochastic simulation of rule-based models. BMC bioinformatics 11, 1 (2010), 1--14. Joshua Colvin, Michael I Monine, Ryan N Gutenkunst, William S Hlavacek, Daniel D Von Hoff, and Richard G Posner. 2010. RuleMonkey: software for stochastic simulation of rule-based models. BMC bioinformatics 11, 1 (2010), 1--14."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.2c00205"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Pavlos S. Efraimidis. 2015. Weighted Random Sampling over Data Streams. In Algorithms Probability Networks and Games. 183--195. Pavlos S. Efraimidis. 2015. Weighted Random Sampling over Data Streams. In Algorithms Probability Networks and Games. 183--195.","DOI":"10.1007\/978-3-319-24024-4_12"},{"key":"e_1_2_1_22_1","volume-title":"Rivest","author":"Galperin Igal","year":"1993","unstructured":"Igal Galperin and Ronald L . Rivest . 1993 . Scapegoat Trees. In SODA, Vijaya Ramachandran (Ed .). 165--174. Igal Galperin and Ronald L. Rivest. 1993. Scapegoat Trees. In SODA, Vijaya Ramachandran (Ed.). 165--174."},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Pankaj Gupta Ashish Goel Jimmy Lin Aneesh Sharma Dong Wang and Reza Zadeh. 2013. WTF: the who to follow service at Twitter. In WWW. 505--514. Pankaj Gupta Ashish Goel Jimmy Lin Aneesh Sharma Dong Wang and Reza Zadeh. 2013. WTF: the who to follow service at Twitter. In WWW. 505--514.","DOI":"10.1145\/2488388.2488433"},{"key":"e_1_2_1_24_1","first-page":"253","article-title":"Maintaining Discrete Probability Distributions Optimally","volume":"700","author":"Hagerup Torben","year":"1993","unstructured":"Torben Hagerup , Kurt Mehlhorn , and J. Ian Munro . 1993 . Maintaining Discrete Probability Distributions Optimally . In ICALP , Vol. 700. 253 -- 264 . Torben Hagerup, Kurt Mehlhorn, and J. Ian Munro. 1993. Maintaining Discrete Probability Distributions Optimally. In ICALP, Vol. 700. 253--264.","journal-title":"ICALP"},{"key":"e_1_2_1_25_1","first-page":"1","volume-title":"Proc. ACM Manag. Data 1","author":"Hou Guanhao","year":"2023","unstructured":"Guanhao Hou , Qintian Guo , Fangyuan Zhang , Sibo Wang , and Zhewei Wei . 2023 . Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme . Proc. ACM Manag. Data 1 , 1 (2023), 25:1--25:26. Guanhao Hou, Qintian Guo, Fangyuan Zhang, Sibo Wang, and Zhewei Wei. 2023. Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme. Proc. ACM Manag. Data 1, 1 (2023), 25:1--25:26."},{"key":"e_1_2_1_26_1","unstructured":"Xiaocheng Hu Miao Qiao and Yufei Tao. 2014. Independent range sampling. In PODS. 246--255. Xiaocheng Hu Miao Qiao and Yufei Tao. 2014. Independent range sampling. In PODS. 246--255."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Lorenz H\u00fcbschle-Schneider and Peter Sanders. 2020. Communication-Efficient Weighted Reservoir Sampling from Fully Distributed Data Streams. In SPAA. ACM 543--545. Lorenz H\u00fcbschle-Schneider and Peter Sanders. 2020. Communication-Efficient Weighted Reservoir Sampling from Fully Distributed Data Streams. In SPAA. ACM 543--545.","DOI":"10.1145\/3350755.3400287"},{"key":"e_1_2_1_28_1","volume-title":"Woodruff","author":"Jayaram Rajesh","year":"2019","unstructured":"Rajesh Jayaram , Gokarna Sharma , Srikanta Tirthapura , and David P . Woodruff . 2019 . Weighted Reservoir Sampling from Distributed Streams. In PODS. 218--235. Rajesh Jayaram, Gokarna Sharma, Srikanta Tirthapura, and David P. Woodruff. 2019. Weighted Reservoir Sampling from Distributed Streams. In PODS. 218--235."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(99)00033-X"},{"key":"e_1_2_1_30_1","unstructured":"Wenqing Lin. 2019. Distributed Algorithms for Fully Personalized PageRank on Large Graphs. In WWW. 1084--1094. Wenqing Lin. 2019. Distributed Algorithms for Fully Personalized PageRank on Large Graphs. In WWW. 1084--1094."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-003-1078-6"},{"key":"e_1_2_1_32_1","volume-title":"Mathieu Salzmann, and Lars Petersson.","author":"Najafi Mohammad","year":"2016","unstructured":"Mohammad Najafi , Sarah Taghavi Namin , Mathieu Salzmann, and Lars Petersson. 2016 . Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering. In CVPR. IEEE Computer Society , 607--615. Mohammad Najafi, Sarah Taghavi Namin, Mathieu Salzmann, and Lars Petersson. 2016. Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering. In CVPR. IEEE Computer Society, 607--615."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1137\/0202005"},{"key":"e_1_2_1_34_1","unstructured":"Frank Olken and Doron Rotem. 1986. Simple Random Sampling from Relational Databases. In VLDB. 160--169. Frank Olken and Doron Rotem. 1986. Simple Random Sampling from Relational Databases. In VLDB. 160--169."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Frank Olken and Doron Rotem. 1993. Sampling from Spatial Databases. In ICDE. 199--208. Frank Olken and Doron Rotem. 1993. Sampling from Spatial Databases. In ICDE. 199--208.","DOI":"10.1109\/ICDE.1993.344062"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00140664"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.002"},{"key":"e_1_2_1_40_1","volume-title":"Eye blink detection for different driver states in conditionally automated driving and manual driving using EOG and a driver camera. Behavior research methods 50","author":"Schmidt J\u00fcrgen","year":"2018","unstructured":"J\u00fcrgen Schmidt , Rihab Laarousi , Wolfgang Stolzmann , and Katja Karrer-Gau\u00df . 2018. Eye blink detection for different driver states in conditionally automated driving and manual driving using EOG and a driver camera. Behavior research methods 50 ( 2018 ), 1088--1101. J\u00fcrgen Schmidt, Rihab Laarousi, Wolfgang Stolzmann, and Katja Karrer-Gau\u00df. 2018. Eye blink detection for different driver states in conditionally automated driving and manual driving using EOG and a driver camera. Behavior research methods 50 (2018), 1088--1101."},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Yufei Tao. 2022. Algorithmic Techniques for Independent Query Sampling. In PODS. 129--138. Yufei Tao. 2022. Algorithmic Techniques for Independent Query Sampling. In PODS. 129--138.","DOI":"10.1145\/3517804.3526068"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/355744.355749"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850584"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3021924.3021936"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Sibo Wang and Yufei Tao. 2018. Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks. In SIGMOD. 1113--1127. Sibo Wang and Yufei Tao. 2018. Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks. In SIGMOD. 1113--1127.","DOI":"10.1145\/3183713.3196919"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360902"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098072"},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Dong Xie Jeff M. Phillips Michael Matheny and Feifei Li. 2021. Spatial Independent Range Sampling. In SIGMOD. 2023--2035. Dong Xie Jeff M. Phillips Michael Matheny and Feifei Li. 2021. Spatial Independent Range Sampling. In SIGMOD. 2023--2035.","DOI":"10.1145\/3448016.3452806"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Zhuoyue Zhao Robert Christensen Feifei Li Xiao Hu and Ke Yi. 2018. Random Sampling over Joins Revisited. In SIGMOD. ACM 1525--1539. Zhuoyue Zhao Robert Christensen Feifei Li Xiao Hu and Ke Yi. 2018. Random Sampling over Joins Revisited. In SIGMOD. ACM 1525--1539.","DOI":"10.1145\/3183713.3183739"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3617838.3617840","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,4]],"date-time":"2024-11-04T22:57:38Z","timestamp":1730761058000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3617838.3617840"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["10.14778\/3617838.3617840"],"URL":"https:\/\/doi.org\/10.14778\/3617838.3617840","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,9]]},"assertion":[{"value":"2023-12-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}