{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T20:34:58Z","timestamp":1780346098762,"version":"3.54.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2020,8]]},"abstract":"<jats:p>Google's Dremel was one of the first systems that combined a set of architectural principles that have become a common practice in today's cloud-native analytics tools, including disaggregated storage and compute, in situ analysis, and columnar storage for semistructured data. In this paper, we discuss how these ideas evolved in the past decade and became the foundation for Google BigQuery.<\/jats:p>","DOI":"10.14778\/3415478.3415568","type":"journal-article","created":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T18:46:40Z","timestamp":1600109200000},"page":"3461-3472","source":"Crossref","is-referenced-by-count":65,"title":["Dremel"],"prefix":"10.14778","volume":"13","author":[{"given":"Sergey","family":"Melnik","sequence":"first","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andrey","family":"Gubarev","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jing Jing","family":"Long","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Geoffrey","family":"Romer","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shiva","family":"Shivakumar","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matt","family":"Tolton","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Theo","family":"Vassilakis","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hossein","family":"Ahmadi","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dan","family":"Delorey","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Slava","family":"Min","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mosha","family":"Pasumansky","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jeff","family":"Shute","sequence":"additional","affiliation":[{"name":"Google LLC"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"The Seattle Report on Database Research. ACM SIGMOD Record, 48(4)","author":"Abadi D.","year":"2020","unstructured":"D. Abadi, A. Ailamaki, D. Andersen, P. Bailis, M. Balazinska, P. Bernstein, P. Boncz, S. Chaudhuri, A. Cheung, A. Doan, et al. The Seattle Report on Database Research. ACM SIGMOD Record, 48(4), 2020."},{"key":"e_1_2_1_2_1","volume-title":"SIGMOD","author":"Abadi D.","year":"2006","unstructured":"D. Abadi, S. Madden, and M. Ferreira. Integrating Compression and Execution in Column-Oriented Database Systems. In SIGMOD, 2006."},{"key":"e_1_2_1_3_1","volume-title":"Storing and Querying Tree-Structured Records in Dremel. PVLDB, 7(12)","author":"Afrati F. N.","year":"2014","unstructured":"F. N. Afrati, D. Delorey, M. Pasumansky, and J. D. Ullman. Storing and Querying Tree-Structured Records in Dremel. PVLDB, 7(12), 2014."},{"key":"e_1_2_1_4_1","volume-title":"Google Cloud Blog","author":"Ahmadi H.","year":"2016","unstructured":"H. Ahmadi. In-memory Query Execution in Google BigQuery. Google Cloud Blog, Aug 2016."},{"key":"e_1_2_1_5_1","volume-title":"M7: Oracle's Next-Generation Sparc Processor","author":"Aingaran K.","year":"2015","unstructured":"K. Aingaran, S. Jairath, G. Konstadinidis, S. Leung, P. Loewenstein, C. McAllister, S. Phillips, Z. Radovic, R. Sivaramakrishnan, D. Smentek, et al. M7: Oracle's Next-Generation Sparc Processor. IEEE Micro, 35(2), 2015."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213864"},{"key":"e_1_2_1_7_1","volume-title":"SIGMOD","author":"Antonopoulos P.","year":"2019","unstructured":"P. Antonopoulos, A. Budovski, C. Diaconu, A. Hernandez Saenz, J. Hu, H. Kodavalla, D. Kossmann, S. Lingam, U. F. Minhas, N. Prakash, et al. Socrates: The New SQL Server in the Cloud. In SIGMOD, 2019."},{"key":"e_1_2_1_8_1","volume-title":"SIGMOD","author":"Bacon D. F.","year":"2017","unstructured":"D. F. Bacon, N. Bales, N. Bruno, B. F. Cooper, A. Dickinson, A. Fikes, C. Fraser, A. Gubarev, M. Joshi, E. Kogan, A. Lloyd, S. Melnik, R. Rao, D. Shue, C. Taylor, M. van der Holst, and D. Woodford. Spanner: Becoming a SQL System. In SIGMOD, 2017."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1643608"},{"key":"e_1_2_1_10_1","volume-title":"CIDR","author":"Bernstein P. A.","year":"2011","unstructured":"P. A. Bernstein, C. W. Reid, and S. Das. Hyder - A Transactional Record Manager for Shared Flash. In CIDR, 2011."},{"key":"e_1_2_1_11_1","volume-title":"Borg, Omega, and Kubernetes. Queue, 14(1)","author":"Burns B.","year":"2016","unstructured":"B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes. Borg, Omega, and Kubernetes. Queue, 14(1), 2016."},{"key":"e_1_2_1_12_1","volume-title":"Efficient Data-Parallel Pipelines. In PLDI","author":"Chambers C.","year":"2010","unstructured":"C. Chambers, A. Raniwala, F. Perry, S. Adams, R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: Easy, Efficient Data-Parallel Pipelines. In PLDI, 2010."},{"key":"e_1_2_1_13_1","volume-title":"OSDI","author":"Chang F.","year":"2011","unstructured":"F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. OSDI, 2011."},{"key":"e_1_2_1_14_1","volume-title":"Intl. Colloquium on Automata, Languages, and Programming","author":"Charikar M.","year":"2002","unstructured":"M. Charikar, K. Chen, and M. Farach-Colton. Finding Frequent Items in Data Streams. In Intl. Colloquium on Automata, Languages, and Programming. Springer, 2002."},{"key":"e_1_2_1_15_1","volume-title":"et al. Procella: Unifying Serving and Analytical Data at YouTube. PVLDB, 12(12)","author":"Chattopadhyay B.","year":"2019","unstructured":"B. Chattopadhyay, P. Dutta, W. Liu, O. Tinn, A. Mccormick, A. Mokashi, P. Harvey, H. Gonzalez, D. Lomax, S. Mittal, et al. Procella: Unifying Serving and Analytical Data at YouTube. PVLDB, 12(12), 2019."},{"key":"e_1_2_1_16_1","volume-title":"Tenzing: a SQL Implementation on the MapReduce Framework. PVLDB, 4(12)","author":"Chattopadhyay B.","year":"2011","unstructured":"B. Chattopadhyay, L. Lin, W. Liu, S. Mittal, P. Aragonda, V. Lychagina, Y. Kwon, and M. Wong. Tenzing: a SQL Implementation on the MapReduce Framework. PVLDB, 4(12), 2011."},{"key":"e_1_2_1_17_1","volume-title":"OSDI","author":"Corbett J. C.","year":"2012","unstructured":"J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Google's Globally-Distributed Database. In OSDI, 2012."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903741"},{"key":"e_1_2_1_19_1","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004."},{"issue":"6","key":"e_1_2_1_20_1","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1145\/1953122.1953147","volume":"54","author":"Franklin M. J.","year":"2011","unstructured":"M. J. Franklin. Technical Perspective - Data Analysis at Astonishing Speed. Commun. ACM, 54(6):113, 2011.","journal-title":"Astonishing Speed. Commun. ACM"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_22_1","volume-title":"Scientific Data Management in the Coming Decade. ACM SIGMOD Record, 34(4)","author":"Gray J.","year":"2005","unstructured":"J. Gray, D. T. Liu, M. Nieto-Santisteban, A. Szalay, D. J. DeWitt, and G. Heber. Scientific Data Management in the Coming Decade. ACM SIGMOD Record, 34(4), 2005."},{"key":"e_1_2_1_23_1","volume-title":"Near Real-Time, Scalable Data Warehousing. PVLDB, 7(12)","author":"Gupta A.","year":"2014","unstructured":"A. Gupta, F. Yang, J. Govig, A. Kirsch, K. Chan, K. Lai, S. Wu, S. G. Dhoot, A. R. Kumar, A. Agiwal, et al. Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing. PVLDB, 7(12), 2014."},{"key":"e_1_2_1_24_1","volume-title":"Processing a Trillion Cells per Mouse Click. PVLDB, 5(11)","author":"Hall A.","year":"2012","unstructured":"A. Hall, O. Bachmann, R. Bussow, S. Ganceanu, and M. Nunkesser. Processing a Trillion Cells per Mouse Click. PVLDB, 5(11), 2012."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"M. Z. Hanani. An Optimal Evaluation of Boolean Expressions in an Online Query System. Commun. ACM 20(5) 1977.","DOI":"10.1145\/359581.359600"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452456"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/115790.115835"},{"key":"e_1_2_1_28_1","volume-title":"Proc. of the 11th European Conf. on Computer Systems","author":"Klimovic A.","year":"2016","unstructured":"A. Klimovic, C. Kozyrakis, E. Thereska, B. John, and S. Kumar. Flash Storage Disaggregation. In Proc. of the 11th European Conf. on Computer Systems, 2016."},{"issue":"3","key":"e_1_2_1_29_1","doi-asserted-by":"crossref","DOI":"10.1145\/2338626.2338633","volume":"37","author":"Lemire D.","year":"2012","unstructured":"D. Lemire, O. Kaser, and E. Gutarra. Reordering Rows for Better Compression: Beyond the Lexicographic Order. ACM Trans. on Database Systems (TODS), 37(3), 2012.","journal-title":"ACM Trans. on Database Systems (TODS)"},{"key":"e_1_2_1_30_1","volume-title":"Google Cloud Blog","author":"Marian T.","year":"2017","unstructured":"T. Marian, M. Dvorsk\u00fd, A. Kumar, and S. Sokolenko. Introducing Cloud Dataflow Shuffle: For up to 5X Performance Improvement in Data Analytic Pipelines. Google Cloud Blog, June 2017."},{"key":"e_1_2_1_31_1","volume-title":"GFS: Evolution on Fast-forward. Queue, 7(7)","author":"McKusick M. K.","year":"2009","unstructured":"M. K. McKusick and S. Quinlan. GFS: Evolution on Fast-forward. Queue, 7(7), 2009."},{"key":"e_1_2_1_32_1","volume-title":"Dremel: Interactive Analysis of Web-Scale Datasets. PVLDB, 3(1)","author":"Melnik S.","year":"2010","unstructured":"S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. PVLDB, 3(1), 2010."},{"key":"e_1_2_1_33_1","volume-title":"Dremel: Interactive Analysis of Web-Scale Datasets. Commun. ACM, 54(6)","author":"Melnik S.","year":"2011","unstructured":"S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. Commun. ACM, 54(6), 2011."},{"key":"e_1_2_1_34_1","volume-title":"Google Remakes Online Empire with `Colossus'. Wired [Online]. Available: http:\/\/www.wired.com\/2012\/07\/google-colossus\/","author":"Metz C.","year":"2012","unstructured":"C. Metz. Google Remakes Online Empire with `Colossus'. Wired [Online]. Available: http:\/\/www.wired.com\/2012\/07\/google-colossus\/, 2012."},{"key":"e_1_2_1_35_1","volume-title":"Apr","author":"Pasumansky M.","year":"2016","unstructured":"M. Pasumansky. Inside Capacitor, BigQuery's Next-Generation Columnar Storage Format. Google Cloud Blog, Apr 2016."},{"key":"e_1_2_1_36_1","first-page":"277","volume":"13","author":"Pike R.","year":"2005","unstructured":"R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the Data: Parallel Analysis with Sawzall. Scientific Programming Journal, 13:277--298, 2005.","journal-title":"Interpreting the Data: Parallel Analysis with Sawzall. Scientific Programming Journal"},{"key":"e_1_2_1_37_1","unstructured":"Protocol Buffers: Developer Guide. Available at http:\/\/code.google.com\/apis\/protocolbuffers\/docs\/overview.html."},{"key":"e_1_2_1_38_1","volume-title":"Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE. PVLDB, 12(10)","author":"Qiao S.","year":"2019","unstructured":"S. Qiao, A. Nicoara, J. Sun, M. Friedman, H. Patel, and J. Ekanayake. Hyper Dimension Shuffle: Efficient Data Repartition at Petabyte Scale in SCOPE. PVLDB, 12(10), 2019."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056100"},{"key":"e_1_2_1_40_1","volume-title":"F1 Query: Declarative Querying at Scale. PVLDB, 11(12)","author":"Samwel B.","year":"2018","unstructured":"B. Samwel, J. Cieslewicz, B. Handy, J. Govig, P. Venetis, C. Yang, K. Peters, J. Shute, D. Tenedorio, H. Apte, et al. F1 Query: Declarative Querying at Scale. PVLDB, 11(12), 2018."},{"key":"e_1_2_1_41_1","volume-title":"F1: A Distributed SQL Database that Scales. PVLDB, 6(11)","author":"Shute J.","year":"2013","unstructured":"J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. O. K. Littlefield, D. Menestrina, S. E. J. Cieslewicz, I. Rae, et al. F1: A Distributed SQL Database that Scales. PVLDB, 6(11), 2013."},{"key":"e_1_2_1_42_1","volume-title":"Thrift: Scalable Cross-Language Services Implementation. Facebook White Paper, 5(8)","author":"Slee M.","year":"2007","unstructured":"M. Slee, A. Agarwal, and M. Kwiatkowski. Thrift: Scalable Cross-Language Services Implementation. Facebook White Paper, 5(8), 2007."},{"key":"e_1_2_1_43_1","volume-title":"MapReduce and Parallel DBMSs: Friends or Foes? Commun. ACM, 53(1)","author":"Stonebraker M.","year":"2010","unstructured":"M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and Parallel DBMSs: Friends or Foes? Commun. ACM, 53(1), 2010."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056101"},{"key":"e_1_2_1_45_1","volume-title":"Proc. of the 10th European Conf. on Computer Systems","author":"Verma A.","year":"2015","unstructured":"A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-Scale Cluster Management at Google with Borg. In Proc. of the 10th European Conf. on Computer Systems, 2015."},{"key":"e_1_2_1_46_1","volume-title":"F1 Lightning: HTAP as a Service. PVLDB, 13(12)","author":"Yang J.","year":"2020","unstructured":"J. Yang, I. Rae, J. Xu, J. Shute, Z. Yuan, K. Lau, Q. Zeng, X. Zhao, J. Ma, Z. Chen, Y. Gao, Q. Dong, J. Zhou, J. Wood, G. Graefe, J. Naughton, and J. Cieslewicz. F1 Lightning: HTAP as a Service. PVLDB, 13(12), 2020."},{"key":"e_1_2_1_47_1","volume-title":"Proc. of the 13th EuroSys Conf.","author":"Zhang H.","year":"2018","unstructured":"H. Zhang, B. Cho, E. Seyfe, A. Ching, and M. J. Freedman. Riffle: Optimized Shuffle Service for Large-Scale Data Analytics. In Proc. of the 13th EuroSys Conf., 2018."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3415478.3415568","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T02:13:03Z","timestamp":1758075183000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3415478.3415568"}},"subtitle":["a decade of interactive SQL analysis at web scale"],"short-title":[],"issued":{"date-parts":[[2020,8]]},"references-count":47,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2020,8]]}},"alternative-id":["10.14778\/3415478.3415568"],"URL":"https:\/\/doi.org\/10.14778\/3415478.3415568","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2020,8]]}}}