{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,3]],"date-time":"2025-07-03T05:46:32Z","timestamp":1751521592895,"version":"3.41.0"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,8,20]],"date-time":"2021-08-20T00:00:00Z","timestamp":1629417600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61672143"],"award-info":[{"award-number":["61672143"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005047","name":"Natural Science Foundation of Liaoning Province","doi-asserted-by":"crossref","award":["2020-BS-054"],"award-info":[{"award-number":["2020-BS-054"]}],"id":[{"id":"10.13039\/501100005047","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM\/IMS Trans. Data Sci."],"published-print":{"date-parts":[[2021,8,31]]},"abstract":"<jats:p>In big data query processing, there is a trade-off between query accuracy and query efficiency, for example, sampling query approaches trade-off query completeness for efficiency. In this article, we argue that query performance can be significantly improved by slightly losing the possibility of query completeness, that is, the chance that a query is complete. To quantify the possibility, we define a new concept, Probability of query Completeness (hereinafter referred to as PC). For example, If a query is executed 100 times, PC = 0.95 guarantees that there are no more than 5 incomplete results among 100 results. Leveraging the probabilistic data placement and scanning, we trade off PC for query performance. In the article, we propose PoBery (POssibly-complete Big data quERY), a method that supports neither complete queries nor incomplete queries, but possibly-complete queries. The experimental results conducted on HiBench prove that PoBery can significantly accelerate queries while ensuring the PC. Specifically, it is guaranteed that the percentage of complete queries is larger than the given PC confidence. Through comparison with state-of-the-art key-value stores, we show that while Drill-based PoBery performs as fast as Drill on complete queries, it is 1.7 \u00d7, 1.1 \u00d7, and 1.5 \u00d7 faster on average than Drill, Impala, and Hive, respectively, on possibly-complete queries.<\/jats:p>","DOI":"10.1145\/3465375","type":"journal-article","created":{"date-parts":[[2021,8,21]],"date-time":"2021-08-21T04:37:25Z","timestamp":1629520645000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0704-3217","authenticated-orcid":false,"given":"Jie","family":"Song","sequence":"first","affiliation":[{"name":"Northeastern University, Shenyang, Liaoning Province, China"}]},{"given":"Qiang","family":"He","sequence":"additional","affiliation":[{"name":"Swinburne University of Technology, Hawthorn, Victoria, Australia"}]},{"given":"Feifei","family":"Chen","sequence":"additional","affiliation":[{"name":"Deakin University, Docklands, Victoria, Australia"}]},{"given":"Ye","family":"Yuan","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, Liaoning Province, China"}]},{"given":"Ge","family":"Yu","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, Liaoning Province, China"}]}],"member":"320","published-online":{"date-parts":[[2021,8,20]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694730.2694731"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of SIGMOD\u201918","author":"Chandramouli Badrish","year":"2018","unstructured":"Badrish Chandramouli , Guna Prasaad , Donald Kossmann , Justin Levandoski , James Hunter , and Mike Barnett . 2018 . FASTER: A concurrent key-value store with in-place updates . In Proceedings of SIGMOD\u201918 . New York, NY, 275\u2013290. Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A concurrent key-value store with in-place updates. In Proceedings of SIGMOD\u201918. New York, NY, 275\u2013290."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","first-page":"1225","DOI":"10.1109\/TCYB.2013.2289351","article-title":"Robust hashing with local models for approximate similarity search","volume":"44","author":"Jingkuan Song","year":"2014","unstructured":"Song Jingkuan , Yi Yang , Xuelong Li , Zi Huang , and Yang Yang . 2014 . Robust hashing with local models for approximate similarity search . IEEE Transactions on Cybernetics 44 , 7 (2014), 1225 \u2013 1236 . Song Jingkuan, Yi Yang, Xuelong Li, Zi Huang, and Yang Yang. 2014. Robust hashing with local models for approximate similarity search. IEEE Transactions on Cybernetics 44, 7 (2014), 1225\u20131236.","journal-title":"IEEE Transactions on Cybernetics"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872822"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1089\/big.2013.0011"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 7th Biennial Conference on Innovative Data Systems Research.","author":"M. Kornacker","year":"2015","unstructured":"M. Kornacker et al. 2015 . Impala: A modern, open-source SQL engine for Hadoop . In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research. M. Kornacker et al. 2015. Impala: A modern, open-source SQL engine for Hadoop. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","first-page":"1626","DOI":"10.14778\/1687553.1687609","article-title":"Hive: A warehousing solution over a map-reduce framework","volume":"2","author":"Ashish Thusoo","year":"2009","unstructured":"Thusoo Ashish , Joydeep Sen Sarma , Namit Jain , Zheng Shao , Prasad Chakka , Suresh Anthony , Hao Liu , Pete Wyckoff , and Raghotham Murthy . 2009 . Hive: A warehousing solution over a map-reduce framework . Proceedings of the VLDB Endowment 2 , 2 (2009), 1626 \u2013 1629 . Thusoo Ashish, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2, 2 (2009), 1626\u20131629.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1357\u20131369","author":"Bikas Saha","year":"2015","unstructured":"Saha Bikas , Hitesh Shah , Siddharth Seth , Gopal Vijayaraghavan , Arun Murthy , and Carlo Curino . 2015 . Apache tez: A unifying framework for modeling and building data processing applications . In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1357\u20131369 . Saha Bikas, Hitesh Shah, Siddharth Seth, Gopal Vijayaraghavan, Arun Murthy, and Carlo Curino. 2015. Apache tez: A unifying framework for modeling and building data processing applications. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1357\u20131369."},{"key":"e_1_2_1_9_1","first-page":"154","article-title":"Examination and comparison of conflicting data in granulated datasets: Equal width interval vs","volume":"239","author":"Wu Chien-Hsing","year":"2013","unstructured":"Chien-Hsing Wu , Shu-Chen Kao , and Koji Okuhara . 2013 . Examination and comparison of conflicting data in granulated datasets: Equal width interval vs . Equal Frequency Interval. Inf. Sci. 239 (2013), 154 \u2013 164 . Chien-Hsing Wu, Shu-Chen Kao, and Koji Okuhara. 2013. Examination and comparison of conflicting data in granulated datasets: Equal width interval vs. Equal Frequency Interval. Inf. Sci. 239 (2013), 154\u2013164.","journal-title":"Equal Frequency Interval. Inf. Sci."},{"key":"e_1_2_1_10_1","unstructured":"Amdahl's law. Retrieved June 22 2021 from https:\/\/en.wikipedia.org\/wiki\/Amdahl%27s_law.  Amdahl's law. Retrieved June 22 2021 from https:\/\/en.wikipedia.org\/wiki\/Amdahl%27s_law."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10723-016-9370-2"},{"key":"e_1_2_1_12_1","unstructured":"Apache Hadoop. Retrieved June 22 2021 from https:\/\/hadoop.apache.org\/.  Apache Hadoop. Retrieved June 22 2021 from https:\/\/hadoop.apache.org\/."},{"volume-title":"The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. New Frontiers in Information and Software as Services","author":"Shengsheng Huang","key":"e_1_2_1_13_1","unstructured":"Huang Shengsheng , Huang Jie , Dai Jinquan , Xie Tao , and Huang. Bo. 2011. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. New Frontiers in Information and Software as Services . Springer , Berlin . Huang Shengsheng, Huang Jie, Dai Jinquan, Xie Tao, and Huang. Bo. 2011. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. New Frontiers in Information and Software as Services. Springer, Berlin."},{"key":"e_1_2_1_14_1","volume-title":"Retrieved","author":"O'Malley Owen","year":"2008","unstructured":"Owen O'Malley . 2008 . Terabyte sort on Apache Hadoop . Retrieved June 22, 2021 from http:\/\/sortbenchmark.org\/Yahoo-Hadoop.pdf. Owen O'Malley. 2008. Terabyte sort on Apache Hadoop. Retrieved June 22, 2021 from http:\/\/sortbenchmark.org\/Yahoo-Hadoop.pdf."},{"key":"e_1_2_1_15_1","first-page":"30","article-title":"Performance and energy optimization on Terasort algorithm by task self-resizing","volume":"44","author":"Shu Xu Jie Song","year":"2015","unstructured":"Jie Song Shu Xu , Li Zhang , Claus Pahl , and Ge Yu . 2015 . Performance and energy optimization on Terasort algorithm by task self-resizing . Inf. Technol. Control 44 , 1 (2015), 30 \u2013 40 . Jie Song Shu Xu, Li Zhang, Claus Pahl, and Ge Yu. 2015. Performance and energy optimization on Terasort algorithm by task self-resizing. Inf. Technol. Control 44, 1 (2015), 30\u201340.","journal-title":"Inf. Technol. Control"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","first-page":"1295","DOI":"10.14778\/2732977.2733002","article-title":"SQL-on-Hadoop: full circle back to shared-nothing database architectures","volume":"7","author":"Avrilia Floratou","year":"2014","unstructured":"Floratou Avrilia , Umar Farooq Minhas , and Fatma \u00d6zcan . 2014 . SQL-on-Hadoop: full circle back to shared-nothing database architectures . Proceedings of the VLDB Endowment 7 , 12 (2014), 1295 \u2013 1306 . Floratou Avrilia, Umar Farooq Minhas, and Fatma \u00d6zcan. 2014. SQL-on-Hadoop: full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment 7, 12 (2014), 1295\u20131306.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_17_1","unstructured":"Apache Parquet. Retrieved June 22 2021 from http:\/\/parquet.apache.org\/.  Apache Parquet. Retrieved June 22 2021 from http:\/\/parquet.apache.org\/."},{"key":"e_1_2_1_18_1","volume-title":"Companion of the 2018 ACM\/SPEC International Conference on Performance Engineering. ACM, 147\u2013152","author":"Ignacio Requeno Jos\u00e9","year":"2018","unstructured":"Requeno Jos\u00e9 Ignacio , I\u00f1igo Gasc\u00f3n , and Jos\u00e9 Merseguer . 2018 . Towards the performance analysis of Apache tez applications . In Companion of the 2018 ACM\/SPEC International Conference on Performance Engineering. ACM, 147\u2013152 . Requeno Jos\u00e9 Ignacio, I\u00f1igo Gasc\u00f3n, and Jos\u00e9 Merseguer. 2018. Towards the performance analysis of Apache tez applications. In Companion of the 2018 ACM\/SPEC International Conference on Performance Engineering. ACM, 147\u2013152."},{"key":"e_1_2_1_19_1","volume-title":"Retrieved","author":"Prokopp Christian","year":"2014","unstructured":"Christian Prokopp . 2014 . ORC: An intelligent big data file format for Hadoop and Hive . Retrieved June 22, 2021 from https:\/\/www.semantikoz.com\/blog\/orc-intelligent-big-data-file-format-hadoop-hive\/. Christian Prokopp. 2014. ORC: An intelligent big data file format for Hadoop and Hive. Retrieved June 22, 2021 from https:\/\/www.semantikoz.com\/blog\/orc-intelligent-big-data-file-format-hadoop-hive\/."},{"key":"e_1_2_1_20_1","volume-title":"IEEE International Conference on Big Data (Big Data\u201917)","author":"Ashish Tapdiya","year":"2017","unstructured":"Tapdiya Ashish and Daniel Fabbri . 2017 . A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics . In IEEE International Conference on Big Data (Big Data\u201917) . IEEE, 1349\u20131356. Tapdiya Ashish and Daniel Fabbri. 2017. A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics. In IEEE International Conference on Big Data (Big Data\u201917). IEEE, 1349\u20131356."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 481\u2013492","author":"Sameer Agarwal","year":"2014","unstructured":"Agarwal Sameer , Henry Milner , Ariel Kleiner , Ameet Talwalkar , Michael Jordan , Samuel Madden , Barzan Mozafari , and Ion Stoica . 2014 . Knowing when you're wrong: Building fast and reliable approximate query processing systems . In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 481\u2013492 . Agarwal Sameer, Henry Milner, Ariel Kleiner, Ameet Talwalkar, Michael Jordan, Samuel Madden, Barzan Mozafari, and Ion Stoica. 2014. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 481\u2013492."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1109\/TKDE.2015.2496270","article-title":"Merlin: Exploratory analysis with imprecise queries","volume":"28","author":"Bahar Qarabaqi","year":"2016","unstructured":"Qarabaqi Bahar and Mirek Riedewald . 2016 . Merlin: Exploratory analysis with imprecise queries . IEEE Transactions on Knowledge and Data Engineering 28 , 2 (2016), 342 \u2013 355 . Qarabaqi Bahar and Mirek Riedewald. 2016. Merlin: Exploratory analysis with imprecise queries. IEEE Transactions on Knowledge and Data Engineering 28, 2 (2016), 342\u2013355.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. ACM, 211\u2013223","author":"Paolo Guagliardo","year":"2016","unstructured":"Guagliardo Paolo and Leonid Libkin . 2016 . Making SQL queries correct on incomplete databases: A feasibility study . In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. ACM, 211\u2013223 . Guagliardo Paolo and Leonid Libkin. 2016. Making SQL queries correct on incomplete databases: A feasibility study. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. ACM, 211\u2013223."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 517\u2013528","author":"Kyriaki Dimitriadou","year":"2014","unstructured":"Dimitriadou Kyriaki , Olga Papaemmanouil , and Yanlei Diao . 2014 . Explore-by-example: An automatic query steering framework for interactive data exploration . In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 517\u2013528 . Dimitriadou Kyriaki, Olga Papaemmanouil, and Yanlei Diao. 2014. Explore-by-example: An automatic query steering framework for interactive data exploration. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 517\u2013528."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 26th International Conference on Scientific and Statistical Database Management. ACM, 15","author":"Khan Hina A.","year":"2014","unstructured":"Hina A. Khan , Mohamed A. Sharaf , and Abdullah Albarrak . 2014 . DivIDE: Efficient diversification for interactive data exploration . In Proceedings of the 26th International Conference on Scientific and Statistical Database Management. ACM, 15 . Hina A. Khan, Mohamed A. Sharaf, and Abdullah Albarrak. 2014. DivIDE: Efficient diversification for interactive data exploration. In Proceedings of the 26th International Conference on Scientific and Statistical Database Management. ACM, 15."},{"key":"e_1_2_1_26_1","volume-title":"Retrieved","author":"Oracle Inc.","year":"2018","unstructured":"Oracle Inc. 2018 . Query relaxation . Retrieved June 22, 2021 from https:\/\/docs.oracle.com\/database\/121\/CCAPP\/GUID-7DD2AF6B-88FD-40B7-A522-3F59309D3B35.htm. Oracle Inc. 2018. Query relaxation. Retrieved June 22, 2021 from https:\/\/docs.oracle.com\/database\/121\/CCAPP\/GUID-7DD2AF6B-88FD-40B7-A522-3F59309D3B35.htm."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1095\u20131098","author":"Davide Mottin","year":"2014","unstructured":"Mottin Davide , Alice Marascu , Senjuti Basu Roy , Gautam Das , Themis Palpanas , and Yannis Velegrakis . 2014 . IQR: An interactive query relaxation system for the empty-answer problem . In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1095\u20131098 . Mottin Davide, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, and Yannis Velegrakis. 2014. IQR: An interactive query relaxation system for the empty-answer problem. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1095\u20131098."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 473\u2013482","author":"Verena Kantere","year":"2015","unstructured":"Kantere Verena , George Orfanoudakis , Anastasios Kementsietsidis , and Timos Sellis . 2015 . Query relaxation across heterogeneous data sources . In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 473\u2013482 . Kantere Verena, George Orfanoudakis, Anastasios Kementsietsidis, and Timos Sellis. 2015. Query relaxation across heterogeneous data sources. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 473\u2013482."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/253262.253291"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.781635"},{"key":"e_1_2_1_31_1","first-page":"1","article-title":"Interactive data exploration with smart drill-down (extended version)","volume":"1","author":"Manas Joglekar","year":"2017","unstructured":"Joglekar Manas , Hector Garcia-Molina , and Aditya G. Parameswaran . 2017 . Interactive data exploration with smart drill-down (extended version) . IEEE Transactions on Knowledge & Data Engineering 1 (2017), 1 \u2013 1 . Joglekar Manas, Hector Garcia-Molina, and Aditya G. Parameswaran. 2017. Interactive data exploration with smart drill-down (extended version). IEEE Transactions on Knowledge & Data Engineering 1 (2017), 1\u20131.","journal-title":"IEEE Transactions on Knowledge & Data Engineering"},{"key":"e_1_2_1_32_1","volume-title":"Incomplete data management: A survey. Frontiers of Computer Science","author":"Xiaoye Miao","year":"2017","unstructured":"Miao Xiaoye , Yunjun Gao , Su Guo , and Wanqi Liu . 2017. Incomplete data management: A survey. Frontiers of Computer Science ( 2017 ), 1\u201322. Miao Xiaoye, Yunjun Gao, Su Guo, and Wanqi Liu. 2017. Incomplete data management: A survey. Frontiers of Computer Science (2017), 1\u201322."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, 713\u2013724","author":"Lyublena Antova","year":"2007","unstructured":"Antova Lyublena , Christoph Koch , and Dan Olteanu . 2007 . From complete to incomplete information and back . In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, 713\u2013724 . Antova Lyublena, Christoph Koch, and Dan Olteanu. 2007. From complete to incomplete information and back. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, 713\u2013724."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1109\/TKDE.2015.2460742","article-title":"Top-k dominating queries on incomplete data","volume":"28","author":"Xiaoye Miao","year":"2016","unstructured":"Miao Xiaoye , Yunjun Gao , Baihua Zheng , Gang Chen , and Huiyong Cui . 2016 . Top-k dominating queries on incomplete data . IEEE Transactions on Knowledge and Data Engineering 28 , 1 (2016), 252 \u2013 266 . Miao Xiaoye, Yunjun Gao, Baihua Zheng, Gang Chen, and Huiyong Cui. 2016. Top-k dominating queries on incomplete data. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2016), 252\u2013266.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","first-page":"1349","DOI":"10.1109\/TFUZZ.2016.2516562","article-title":"Processing incomplete k nearest neighbor search","volume":"24","author":"Xiaoye Miao","year":"2016","unstructured":"Miao Xiaoye , Yunjun Gao , Gang Chen , Baihua Zheng , and Huiyong Cui . 2016 . Processing incomplete k nearest neighbor search . IEEE Transactions on Fuzzy Systems 24 , 6 (2016), 1349 \u2013 1363 . Miao Xiaoye, Yunjun Gao, Gang Chen, Baihua Zheng, and Huiyong Cui. 2016. Processing incomplete k nearest neighbor search. IEEE Transactions on Fuzzy Systems 24, 6 (2016), 1349\u20131363.","journal-title":"IEEE Transactions on Fuzzy Systems"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1109\/TKDE.2013.14","article-title":"Searching dimension incomplete databases","volume":"26","author":"Wei Cheng","year":"2014","unstructured":"Cheng Wei , Xiaoming Jin , Jian-Tao Sun , Xuemin Lin , Xiang Zhang , and Wei Wang . 2014 . Searching dimension incomplete databases . IEEE Transactions on Knowledge and Data Engineering 26 , 3 (2014), 725 \u2013 738 . Cheng Wei, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang. 2014. Searching dimension incomplete databases. IEEE Transactions on Knowledge and Data Engineering 26, 3 (2014), 725\u2013738.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1476589.1476628"},{"key":"e_1_2_1_38_1","first-page":"1137","article-title":"A study of cross-validation and bootstrap for accuracy estimation and model selection","volume":"14","author":"Kohavi Ron","year":"1995","unstructured":"Ron Kohavi . 1995 . A study of cross-validation and bootstrap for accuracy estimation and model selection . Ijcai 14 , 2 (1995), 1137 \u2013 1145 . Ron Kohavi. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai 14, 2 (1995), 1137\u20131145.","journal-title":"Ijcai"},{"key":"e_1_2_1_39_1","volume-title":"IEEE 32nd International Conference on Data Engineering (ICDE\u201916)","author":"Yongjoo Park","year":"2016","unstructured":"Park Yongjoo , Michael Cafarella , and Barzan Mozafari . 2016 . Visualization-aware sampling for very large databases . In IEEE 32nd International Conference on Data Engineering (ICDE\u201916) . IEEE, 755\u2013766. Park Yongjoo, Michael Cafarella, and Barzan Mozafari. 2016. Visualization-aware sampling for very large databases. In IEEE 32nd International Conference on Data Engineering (ICDE\u201916). IEEE, 755\u2013766."},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","first-page":"521","DOI":"10.14778\/2735479.2735485","article-title":"Rapid sampling for visualizations with ordering guarantees","volume":"8","author":"Albert Kim","year":"2015","unstructured":"Kim Albert , Eric Blais , Aditya Parameswaran , Piotr Indyk , Sam Madden , and Ronitt Rubinfeld . 2015 . Rapid sampling for visualizations with ordering guarantees . Proceedings of the VLDB Endowment 8 , 5 (2015), 521 \u2013 532 . Kim Albert, Eric Blais, Aditya Parameswaran, Piotr Indyk, Sam Madden, and Ronitt Rubinfeld. 2015. Rapid sampling for visualizations with ordering guarantees. Proceedings of the VLDB Endowment 8, 5 (2015), 521\u2013532.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_41_1","volume-title":"Retrieved","author":"Michael Armbrust","year":"2017","unstructured":"Armbrust Michael , Bill Chambers , and Matei Zaharia . 2017 . Databricks delta: A unified data management system for real-time big data . Retrieved June 30, 2021 from https:\/\/databricks.com\/blog\/2017\/10\/25\/databricks-delta-a-unified-management-system-for-real-time-big-data.html. Armbrust Michael, Bill Chambers, and Matei Zaharia. 2017. Databricks delta: A unified data management system for real-time big data. Retrieved June 30, 2021 from https:\/\/databricks.com\/blog\/2017\/10\/25\/databricks-delta-a-unified-management-system-for-real-time-big-data.html."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 2016 International Conference on Management of Data. ACM, 2153\u20132156","author":"Jags Ramnarayan","year":"2016","unstructured":"Ramnarayan Jags , Barzan Mozafari , Sumedh Wale , Sudhir Menon , Neeraj Kumar , Hemant Bhanawat , Soubhik Chakraborty , Yogesh Mahajan , Rishitesh Mishra , and Kishor Bachhav . 2016 . SnappyData: A hybrid transactional analytical store built on spark . In Proceedings of the 2016 International Conference on Management of Data. ACM, 2153\u20132156 . Ramnarayan Jags, Barzan Mozafari, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, Rishitesh Mishra, and Kishor Bachhav. 2016. SnappyData: A hybrid transactional analytical store built on spark. In Proceedings of the 2016 International Conference on Management of Data. ACM, 2153\u20132156."},{"key":"e_1_2_1_43_1","volume-title":"Retrieved","author":"Presto","year":"2021","unstructured":"Presto : Distributed SQL query engine for big data . Retrieved June 22, 2021 from https:\/\/prestodb.io\/. Presto: Distributed SQL query engine for big data. Retrieved June 22, 2021 from https:\/\/prestodb.io\/."},{"key":"e_1_2_1_44_1","volume-title":"Efficient query evaluation on probabilistic databases. The VLDB Journal\u2014The International Journal on Very Large Data Bases 16, 4","author":"Nilesh Dalvi","year":"2007","unstructured":"Dalvi Nilesh and Dan Suciu . 2007. Efficient query evaluation on probabilistic databases. The VLDB Journal\u2014The International Journal on Very Large Data Bases 16, 4 ( 2007 ), 523\u2013544. Dalvi Nilesh and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. The VLDB Journal\u2014The International Journal on Very Large Data Bases 16, 4 (2007), 523\u2013544."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1067\u20131070","author":"Kai Zeng","year":"2014","unstructured":"Zeng Kai , Shi Gao , Jiaqi Gu , Barzan Mozafari , and Carlo Zaniolo . 2014 . ABS: A system for scalable approximate queries with accuracy guarantees . In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1067\u20131070 . Zeng Kai, Shi Gao, Jiaqi Gu, Barzan Mozafari, and Carlo Zaniolo. 2014. ABS: A system for scalable approximate queries with accuracy guarantees. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1067\u20131070."},{"key":"e_1_2_1_46_1","volume-title":"Gibbons","author":"Garofalakis Minos N.","year":"2001","unstructured":"Minos N. Garofalakis and Phillip B . Gibbons . 2001 . Approximate query processing: Taming the terabytes. In VLDB. 343\u2013352. Minos N. Garofalakis and Phillip B. Gibbons. 2001. Approximate query processing: Taming the terabytes. In VLDB. 343\u2013352."},{"key":"e_1_2_1_47_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/1900000004","article-title":"Synopses for massive data: Samples, histograms, wavelets, sketches","volume":"4","author":"Graham Cormode","year":"2011","unstructured":"Cormode Graham , Minos Garofalakis , Peter J. Haas , and Chris Jermaine . 2011 . Synopses for massive data: Samples, histograms, wavelets, sketches . Foundations and Trends in Databases 4 , 1 \u2013 3 (2011), 1\u2013294. Cormode Graham, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. 2011. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases 4, 1\u20133 (2011), 1\u2013294.","journal-title":"Foundations and Trends in Databases"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 511\u2013519","author":"Surajit Chaudhuri","year":"2017","unstructured":"Chaudhuri Surajit , Bolin Ding , and Srikanth Kandula . 2017 . Approximate query processing: No silver bullet . In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 511\u2013519 . Chaudhuri Surajit, Bolin Ding, and Srikanth Kandula. 2017. Approximate query processing: No silver bullet. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 511\u2013519."},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1007\/s10115-013-0638-6","article-title":"A survey of queries over uncertain data","volume":"37","author":"Yijie Wang","year":"2013","unstructured":"Wang Yijie , Xiaoyong Li , Xiaoling Li , and Yuan Wang . 2013 . A survey of queries over uncertain data . Knowledge and Information Systems 37 , 3 (2013), 485 \u2013 530 . Wang Yijie, Xiaoyong Li, Xiaoling Li, and Yuan Wang. 2013. A survey of queries over uncertain data. Knowledge and Information Systems 37, 3 (2013), 485\u2013530.","journal-title":"Knowledge and Information Systems"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 22nd International Conference on Data Engineering (ICDE\u201906)","author":"Das Sarma Anish","year":"2006","unstructured":"Sarma Anish Das , Omar Benjelloun , Alon Halevy , and Jennifer Widom . 2006 . Working models for uncertain data . In Proceedings of the 22nd International Conference on Data Engineering (ICDE\u201906) . IEEE, 7\u20137. Sarma Anish Das, Omar Benjelloun, Alon Halevy, and Jennifer Widom. 2006. Working models for uncertain data. In Proceedings of the 22nd International Conference on Data Engineering (ICDE\u201906). IEEE, 7\u20137."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1386118.1386119"},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.is.2005.06.002","article-title":"Evaluation of probabilistic queries over imprecise data in constantly-evolving environments","volume":"32","author":"Reynold Cheng","year":"2007","unstructured":"Cheng Reynold , Dmitri V. Kalashnikov , and Sunil Prabhakar . 2007 . Evaluation of probabilistic queries over imprecise data in constantly-evolving environments . Information Systems 32 , 1 (2007), 104 \u2013 130 . Cheng Reynold, Dmitri V. Kalashnikov, and Sunil Prabhakar. 2007. Evaluation of probabilistic queries over imprecise data in constantly-evolving environments. Information Systems 32, 1 (2007), 104\u2013130.","journal-title":"Information Systems"},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1109\/TKDE.2004.46","article-title":"Querying imprecise data in moving object environments","volume":"16","author":"Reynold Cheng","year":"2004","unstructured":"Cheng Reynold , Dmitri V. Kalashnikov , and Sunil Prabhakar . 2004 . Querying imprecise data in moving object environments . IEEE Transactions on Knowledge and Data Engineering 16 , 9 (2004), 1112 \u2013 1127 . Cheng Reynold, Dmitri V. Kalashnikov, and Sunil Prabhakar. 2004. Querying imprecise data in moving object environments. IEEE Transactions on Knowledge and Data Engineering 16, 9 (2004), 1112\u20131127.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"}],"container-title":["ACM\/IMS Transactions on Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3465375","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3465375","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:12Z","timestamp":1750191432000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3465375"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,20]]},"references-count":53,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,8,31]]}},"alternative-id":["10.1145\/3465375"],"URL":"https:\/\/doi.org\/10.1145\/3465375","relation":{},"ISSN":["2691-1922"],"issn-type":[{"type":"print","value":"2691-1922"}],"subject":[],"published":{"date-parts":[[2021,8,20]]},"assertion":[{"value":"2020-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}