{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T16:50:43Z","timestamp":1742403043276},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"1-2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2010,9]]},"abstract":"<jats:p>\n            Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (\n            <jats:italic>Uncertain Primary Index<\/jats:italic>\n            ) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions.\n          <\/jats:p>\n          <jats:p>Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI.<\/jats:p>\n          <jats:p>We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file.<\/jats:p>","DOI":"10.14778\/1920841.1920922","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"630-637","source":"Crossref","is-referenced-by-count":7,"title":["UPI"],"prefix":"10.14778","volume":"3","author":[{"given":"Hideaki","family":"Kimura","sequence":"first","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samuel","family":"Madden","sequence":"additional","affiliation":[{"name":"MIT CSAIL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stanley B.","family":"Zdonik","sequence":"additional","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2010,9]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"www.cse.cuhk.edu.hk\/~taoyf\/paper\/tods07-utree.html.  www.cse.cuhk.edu.hk\/~taoyf\/paper\/tods07-utree.html."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559795.1559816"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2007.03.001"},{"key":"e_1_2_1_4_1","volume-title":"VLDB","author":"Benjelloun O.","year":"2006","unstructured":"O. Benjelloun , A. D. Sarma , A. Halevy , and J. Widom . ULDBs: databases with uncertainty and lineage . In VLDB , 2006 . O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: databases with uncertainty and lineage. In VLDB, 2006."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1316689.1316765"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1538788.1538810"},{"key":"e_1_2_1_7_1","volume-title":"Fractured Indexes: Improved B-trees To Reduce Maintenance Cost And Fragmentation. Master's thesis","author":"Ikhariale N.","year":"2010","unstructured":"N. Ikhariale . Fractured Indexes: Improved B-trees To Reduce Maintenance Cost And Fragmentation. Master's thesis , Brown University , 2010 . N. Ikhariale. Fractured Indexes: Improved B-trees To Reduce Maintenance Cost And Fragmentation. Master's thesis, Brown University, 2010."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-005-0171-7"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687765"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"M. Ley. DBLP - Some Lessons Learned. PVLDB 2009.   M. Ley. DBLP - Some Lessons Learned. PVLDB 2009.","DOI":"10.14778\/1687553.1687577"},{"key":"e_1_2_1_11_1","first-page":"1","volume-title":"Researcher affiliation extraction from homepages. ACL-IJCNLP","author":"Nagy I.","year":"2009","unstructured":"I. Nagy , R. Farkas , and M. Jelasity . Researcher affiliation extraction from homepages. ACL-IJCNLP 2009 , page 1 , 2009. I. Nagy, R. Farkas, and M. Jelasity. Researcher affiliation extraction from homepages. ACL-IJCNLP 2009, page 1, 2009."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367907"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367935"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1388240.1388260"},{"key":"e_1_2_1_16_1","volume-title":"VLDB","author":"Tao Y.","year":"2005","unstructured":"Y. Tao , R. Cheng , X. Xiao , W. Ngai , B. Kao , and S. Prabhakar . Indexing multi-dimensional uncertain data with arbitrary probability density functions . In VLDB , 2005 . Y. Tao, R. Cheng, X. Xiao, W. Ngai, B. Kao, and S. Prabhakar. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB, 2005."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.77"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1920841.1920922","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:53:04Z","timestamp":1672228384000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1920841.1920922"}},"subtitle":["a primary index for uncertain databases"],"short-title":[],"issued":{"date-parts":[[2010,9]]},"references-count":17,"journal-issue":{"issue":"1-2","published-print":{"date-parts":[[2010,9]]}},"alternative-id":["10.14778\/1920841.1920922"],"URL":"https:\/\/doi.org\/10.14778\/1920841.1920922","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2010,9]]}}}