{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T17:13:44Z","timestamp":1780766024855,"version":"3.54.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2020,10]]},"abstract":"<jats:p>Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes.<\/jats:p>","DOI":"10.14778\/3425879.3425880","type":"journal-article","created":{"date-parts":[[2020,11,25]],"date-time":"2020-11-25T02:45:23Z","timestamp":1606272323000},"page":"74-86","source":"Crossref","is-referenced-by-count":110,"title":["Tsunami"],"prefix":"10.14778","volume":"14","author":[{"given":"Jialin","family":"Ding","sequence":"first","affiliation":[{"name":"Massachusetts Insititute of Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vikram","family":"Nathan","sequence":"additional","affiliation":[{"name":"Massachusetts Insititute of Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mohammad","family":"Alizadeh","sequence":"additional","affiliation":[{"name":"Massachusetts Insititute of Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tim","family":"Kraska","sequence":"additional","affiliation":[{"name":"Massachusetts Insititute of Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,11,16]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon AWS. 2016. Amazon Redshift Engineering's Advanced Table Design Playbook: Compound and Interleaved Sort Keys. https:\/\/aws.amazon.com\/blogs\/big-data\/amazon-redshift-engineerings-advanced-table-design-playbook-compound-and-interleaved-sort-keys\/.  Amazon AWS. 2016. Amazon Redshift Engineering's Advanced Table Design Playbook: Compound and Interleaved Sort Keys. https:\/\/aws.amazon.com\/blogs\/big-data\/amazon-redshift-engineerings-advanced-table-design-playbook-compound-and-interleaved-sort-keys\/."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2318857.2254766"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/93605.98741"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360322.3361007"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315451.1315509"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/645923.673646"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807152"},{"key":"e_1_2_1_8_1","unstructured":"Databricks Engineering Blog. [n.d.]. Processing Petabytes of Data in Seconds with Databricks Delta. https:\/\/databricks.com\/blog\/2018\/07\/31\/processing-petabytes-of-data-in-seconds-with-databricks-delta.html.  Databricks Engineering Blog. [n.d.]. Processing Petabytes of Data in Seconds with Databricks Delta. https:\/\/databricks.com\/blog\/2018\/07\/31\/processing-petabytes-of-data-in-seconds-with-databricks-delta.html."},{"key":"e_1_2_1_9_1","volume-title":"2020 Conference on Extending Database Technology (EDBT.","author":"Davitkova Angjela","year":"2020"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389711"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1083592.1083658"},{"key":"e_1_2_1_12_1","volume-title":"Daily Historical Stock Prices (1970 -","author":"Hallmark Evan","year":"2018"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389135"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/280277.280279"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319860"},{"key":"e_1_2_1_16_1","unstructured":"IBM. [n.d.]. The Spatial Index. https:\/\/www.ibm.com\/support\/knowledgecenter\/SSGU8G_12.1.0\/com.ibm.spatial.doc\/ids_spat_024.htm.  IBM. [n.d.]. The Spatial Index. https:\/\/www.ibm.com\/support\/knowledgecenter\/SSGU8G_12.1.0\/com.ibm.spatial.doc\/ids_spat_024.htm."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007641"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1071610.1071612"},{"key":"e_1_2_1_19_1","unstructured":"Irfan Khan. 2012. Falling RAM prices drive in-memory database surge. https:\/\/www.itworld.com\/article\/2718428\/falling-ram-prices-drive-in-memory-database-surge.html.  Irfan Khan. 2012. Falling RAM prices drive in-memory database surge. https:\/\/www.itworld.com\/article\/2718428\/falling-ram-prices-drive-in-memory-database-surge.html."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687765"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920979"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3401071.3401659"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196909"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192968"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389703"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196908"},{"key":"e_1_2_1_27_1","volume-title":"Octree Encoding: A New Technique for the Representation, Manipulation and Display of Arbitrary 3-D Objects by Computer. Technical Report.","author":"Meagher Donald","year":"1980"},{"key":"e_1_2_1_28_1","unstructured":"Microsoft SQL Server. 2016. Spatial Indexes Overview. https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/spatial\/spatial-indexes-overview?view=sql-server-2017.  Microsoft SQL Server. 2016. Spatial Indexes Overview. https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/spatial\/spatial-indexes-overview?view=sql-server-2017."},{"key":"e_1_2_1_29_1","volume-title":"A computer Oriented Geodetic Data Base","author":"Morton G. M."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380579"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/348.318586"},{"key":"e_1_2_1_32_1","unstructured":"NYC Taxi & Limousine Commission. 2020. TLC Trip Record Data. https:\/\/www1.nyc.gov\/site\/tlc\/about\/tlc-trip-record-data.page.  NYC Taxi & Limousine Commission. 2020. TLC Trip Record Data. https:\/\/www1.nyc.gov\/site\/tlc\/about\/tlc-trip-record-data.page."},{"key":"e_1_2_1_33_1","unstructured":"Beng Chin Ooi Ron Sacks-davis and Jiawei Han. 2019. Indexing in Spatial Databases.  Beng Chin Ooi Ron Sacks-davis and Jiawei Han. 2019. Indexing in Spatial Databases."},{"key":"e_1_2_1_34_1","unstructured":"Oracle Database Data Warehousing Guide. 2017. Attribute Clustering. https:\/\/docs.oracle.com\/database\/121\/DWHSG\/attcluster.htm.  Oracle Database Data Warehousing Guide. 2017. Attribute Clustering. https:\/\/docs.oracle.com\/database\/121\/DWHSG\/attcluster.htm."},{"key":"e_1_2_1_35_1","unstructured":"Oracle Inc. [n.d.]. Oracle Database In-Memory. https:\/\/www.oracle.com\/database\/technologies\/in-memory.html.  Oracle Inc. [n.d.]. Oracle Database In-Memory. https:\/\/www.oracle.com\/database\/technologies\/in-memory.html."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/645926.671872"},{"key":"e_1_2_1_37_1","unstructured":"RocksDB. 2020. RocksDB. https:\/\/rocksdb.org\/.  RocksDB. 2020. RocksDB. https:\/\/rocksdb.org\/."},{"key":"e_1_2_1_38_1","unstructured":"Scipy.org. [n.d.]. scipy.optimize.basinhopping. https:\/\/docs.scipy.org\/doc\/scipy-0.18.1\/reference\/generated\/scipy.optimize.basinhopping.html.  Scipy.org. [n.d.]. scipy.optimize.basinhopping. https:\/\/docs.scipy.org\/doc\/scipy-0.18.1\/reference\/generated\/scipy.optimize.basinhopping.html."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/320473.320484"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3137586.3137590"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374547"},{"key":"e_1_2_1_42_1","unstructured":"TPC. 2019. TPC-H. http:\/\/www.tpc.org\/tpch\/.  TPC. 2019. TPC-H. http:\/\/www.tpc.org\/tpch\/."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/846219.847390"},{"key":"e_1_2_1_44_1","volume-title":"Learned Index for Spatial Queries. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). 569--574","author":"Wang H."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319861"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389770"},{"key":"e_1_2_1_47_1","unstructured":"Zack Slayton. 2017. Z-Order Indexing for Multifaceted Queries in Amazon DynamoDB. https:\/\/aws.amazon.com\/blogs\/database\/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-1\/.  Zack Slayton. 2017. Z-Order Indexing for Multifaceted Queries in Amazon DynamoDB. https:\/\/aws.amazon.com\/blogs\/database\/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-1\/."},{"key":"e_1_2_1_48_1","volume-title":"18th USENIX Conference on File and Storage Technologies (FAST 20)","author":"Dong Siying"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3425879.3425880","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:05:02Z","timestamp":1672225502000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3425879.3425880"}},"subtitle":["a learned multi-dimensional index for correlated data and skewed workloads"],"short-title":[],"issued":{"date-parts":[[2020,10]]},"references-count":48,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,10]]}},"alternative-id":["10.14778\/3425879.3425880"],"URL":"https:\/\/doi.org\/10.14778\/3425879.3425880","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2020,10]]}}}