{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,2,19]],"date-time":"2023-02-19T02:26:56Z","timestamp":1676773616513},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>We consider the problem of how to best parallelize range queries in a massive scale distributed database. In traditional systems the focus has been on maximizing parallelism, for example by laying out data to achieve the highest throughput. However, in a massive scale database such as our PNUTS system [11] or BigTable [10], maximizing parallelism is not necessarily the best strategy: the system has more than enough servers to saturate a single client by returning results faster than the client can consume them, and when there are multiple concurrent queries, maximizing parallelism for all of them will cause disk contention, reducing everybody's performance. How can we find the right parallelism level for each query in order to achieve high, consistent throughput for all queries? We propose an adaptive approach with two aspects. First, we adaptively determine the ideal parallelism for a single query execution, which is the minimum number of parallel scanning servers needed to satisfy the client, depending on query selectivity, client load, client-server bandwidth, and so on. Second, we adaptively schedule which servers will be assigned to different query executions, to minimize disk contention on servers and ensure that all queries receive good performance. Our scheduler can be tuned based on different policies, such as favoring short versus long queries or high versus low priority queries. An experimental study demonstrates the effectiveness of our techniques in the PNUTS system.<\/jats:p>","DOI":"10.14778\/1687627.1687705","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"682-693","source":"Crossref","is-referenced-by-count":9,"title":["Adaptively parallelizing distributed range queries"],"prefix":"10.14778","volume":"2","author":[{"given":"Ymir","family":"Vigfusson","sequence":"first","affiliation":[{"name":"Cornell University"}]},{"given":"Adam","family":"Silberstein","sequence":"additional","affiliation":[{"name":"Yahoo! Research"}]},{"given":"Brian F.","family":"Cooper","sequence":"additional","affiliation":[{"name":"Yahoo! Research"}]},{"given":"Rodrigo","family":"Fonseca","sequence":"additional","affiliation":[{"name":"Yahoo! Research"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon SimpleDB. aws.amazon.com\/simpledb\/.  Amazon SimpleDB. aws.amazon.com\/simpledb\/."},{"key":"e_1_2_1_2_1","unstructured":"Extended version. Yahoo! Labs Technical Report YL-2009-003 research.yahoo.com\/Technical_Reports.  Extended version. Yahoo! Labs Technical Report YL-2009-003 research.yahoo.com\/Technical_Reports."},{"key":"e_1_2_1_3_1","unstructured":"Oracle Database Data Warehousing Guide 10g release 2. download-west.oracle.com\/docs\/cd\/B19306_01\/-eerver.102\/b14223\/usingpe.htm.  Oracle Database Data Warehousing Guide 10g release 2. download-west.oracle.com\/docs\/cd\/B19306_01\/-eerver.102\/b14223\/usingpe.htm."},{"key":"e_1_2_1_4_1","unstructured":"SQL Data Services\/Azure Services Platform. www.microsoft.com\/azure\/data.mspx.  SQL Data Services\/Azure Services Platform. www.microsoft.com\/azure\/data.mspx."},{"key":"e_1_2_1_5_1","unstructured":"Teradata. www.teradata.com.  Teradata. www.teradata.com."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/253260.253322"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/335168.335224"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/223784.223876"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.50903"},{"key":"e_1_2_1_10_1","volume-title":"OSDI","author":"Chang F.","year":"2006","unstructured":"F. Chang : A distributed storage system for structured data . In OSDI , 2006 . F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454167"},{"key":"e_1_2_1_12_1","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified data processing on large clusters . In OSDI , 2004 . J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1294261.1294281"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.50905"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/129888.129894"},{"key":"e_1_2_1_16_1","volume-title":"Proc. USENIX Annual Technical Conference","author":"Eriksen M. A.","year":"2005","unstructured":"M. A. Eriksen . Trickle : A userland bandwidth shaper for Unix-like systems . In Proc. USENIX Annual Technical Conference , 2005 . M. A. Eriksen. Trickle: A userland bandwidth shaper for Unix-like systems. In Proc. USENIX Annual Technical Conference, 2005."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/846218.847234"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/1316689.1316729"},{"key":"e_1_2_1_19_1","volume-title":"Handbook of Algorithms and Theory of Computation","author":"Karger D.","year":"1998","unstructured":"D. Karger , C. Stein , and J. Wein . Handbook of Algorithms and Theory of Computation , chapter Scheduling Algorithms. CRC Press , 1998 . D. Karger, C. Stein, and J. Wein. Handbook of Algorithms and Theory of Computation, chapter Scheduling Algorithms. CRC Press, 1998."},{"key":"e_1_2_1_20_1","volume-title":"SIGMOD","author":"Lakshman A.","year":"2008","unstructured":"A. Lakshman , P. Malik , and K. Ranganathan . Cassandra: A structured storage system on a P2P network . In SIGMOD , 2008 . A. Lakshman, P. Malik, and K. Ranganathan. Cassandra: A structured storage system on a P2P network. In SIGMOD, 2008."},{"key":"e_1_2_1_21_1","volume-title":"SODA","author":"Sanders P.","year":"2000","unstructured":"P. Sanders , S. Egner , and J. Korst . Fast concurrent access to parallel disks . In SODA , 2000 . P. Sanders, S. Egner, and J. Korst. Fast concurrent access to parallel disks. In SODA, 2000."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539796299266"},{"key":"e_1_2_1_23_1","volume-title":"VLDB","author":"Stonebraker M.","year":"2005","unstructured":"M. Stonebraker : A column-oriented DBMS . In VLDB , 2005 . M. Stonebraker et al. C-Store: A column-oriented DBMS. In VLDB, 2005."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1064323.1064336"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.7155\/jgaa.00012"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687627.1687705","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:30:59Z","timestamp":1672227059000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687627.1687705"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687627.1687705"],"URL":"https:\/\/doi.org\/10.14778\/1687627.1687705","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}