{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T05:27:43Z","timestamp":1770528463109,"version":"3.49.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>Join is an important database operation. As computer architectures evolve, the best join algorithm may change hand. This paper re-examines two popular join algorithms -- hash join and sort-merge join -- to determine if the latest computer architecture trends shift the tide that has favored hash join for many years. For a fair comparison, we implemented the most optimized parallel version of both algorithms on the latest Intel Core i7 platform. Both implementations scale well with the number of cores in the system and take advantages of latest processor features for performance. Our hash-based implementation achieves more than 100M tuples per second which is 17X faster than the best reported performance on CPUs and 8X faster than that reported for GPUs. Moreover, the performance of our hash join implementation is consistent over a wide range of input data sizes from 64K to 128M tuples and is not affected by data skew. We compare this implementation to our highly optimized sort-based implementation that achieves 47M to 80M tuples per second. We developed analytical models to study how both algorithms would scale with upcoming processor architecture trends. Our analysis projects that current architectural trends of wider SIMD, more cores, and smaller memory bandwidth per core imply better scalability potential for sort-merge join. Consequently, sort-merge join is likely to outperform hash join on upcoming chip multiprocessors. In summary, we offer multicore implementations of hash join and sort-merge join which consistently outperform all previously reported results. We further conclude that the tide that favors the hash join algorithm has not changed yet, but the change is just around the corner.<\/jats:p>","DOI":"10.14778\/1687553.1687564","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1378-1389","source":"Crossref","is-referenced-by-count":218,"title":["Sort vs. Hash revisited"],"prefix":"10.14778","volume":"2","author":[{"given":"Changkyu","family":"Kim","sequence":"first","affiliation":[{"name":"Intel Corporation"}]},{"given":"Tim","family":"Kaldewey","sequence":"additional","affiliation":[{"name":"Oracle Corporation"}]},{"given":"Victor W.","family":"Lee","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Eric","family":"Sedlar","sequence":"additional","affiliation":[{"name":"Oracle Corporation"}]},{"given":"Anthony D.","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Nadathur","family":"Satish","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Jatin","family":"Chhugani","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Andrea","family":"Di Blas","sequence":"additional","affiliation":[{"name":"Oracle Corporation"}]},{"given":"Pradeep","family":"Dubey","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/11575832_11"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1468075.1468121"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.164.0363"},{"key":"e_1_2_1_4_1","first-page":"35","volume-title":"Synthesis of Parallel Algorithms","author":"Blelloch G. E.","year":"1993","unstructured":"G. E. Blelloch . Synthesis of Parallel Algorithms , chapter Prefix sums and their applications, pages 35 -- 60 . Morgan Kaufmann , 1993 . G. E. Blelloch. Synthesis of Parallel Algorithms, chapter Prefix sums and their applications, pages 35--60. Morgan Kaufmann, 1993."},{"key":"e_1_2_1_5_1","first-page":"54","volume-title":"VLDB","author":"Boncz P. A.","year":"1999","unstructured":"P. A. Boncz , S. Manegold , and M. L. Kersten . Database architecture optimized for the new bottleneck: Memory access . In VLDB , pages 54 -- 65 , 1999 . P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, pages 54--65, 1999."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/977401.978128"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454171"},{"key":"e_1_2_1_8_1","first-page":"339","volume-title":"VLDB","author":"Cieslewicz J.","year":"2007","unstructured":"J. Cieslewicz and K. A. Ross . Adaptive aggregation on chip multiprocessors . In VLDB , pages 339 -- 350 , 2007 . J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, pages 339--350, 2007."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1457150.1457156"},{"key":"e_1_2_1_10_1","first-page":"151","volume-title":"VLDB","author":"DeWitt D. J.","year":"1985","unstructured":"D. J. DeWitt and R. H. Gerber . Multiprocessor hash-based join algorithms . In VLDB , pages 151 -- 164 , 1985 . D. J. DeWitt and R. H. Gerber. Multiprocessor hash-based join algorithms. In VLDB, pages 151--164, 1985."},{"key":"e_1_2_1_11_1","volume-title":"Inside Intel core microarchitecture and smart memory access. White Paper","author":"Doweck J.","year":"2006","unstructured":"J. Doweck . Inside Intel core microarchitecture and smart memory access. White Paper , Intel Corporation Jul 2006 . J. Doweck. Inside Intel core microarchitecture and smart memory access. White Paper, Intel Corporation Jul 2006."},{"key":"e_1_2_1_12_1","first-page":"363","volume-title":"VLDB","author":"Gedik B.","year":"2007","unstructured":"B. Gedik , P. S. Yu , and R. Bordawekar . Executing stream joins on the cell processor . In VLDB , pages 363 -- 374 , 2007 . B. Gedik, P. S. Yu, and R. Bordawekar. Executing stream joins on the cell processor. In VLDB, pages 363--374, 2007."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142511"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.334883"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/191839.191886"},{"key":"e_1_2_1_16_1","first-page":"45","volume-title":"Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium","author":"Gre\u00df A.","year":"2006","unstructured":"A. Gre\u00df and G. Zachmann . GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures . In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium , page 45 , Apr. 2006 . A. Gre\u00df and G. Zachmann. GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, page 45, Apr. 2006."},{"key":"e_1_2_1_17_1","first-page":"79","volume-title":"CIDR","author":"Hardavellas N.","year":"2007","unstructured":"N. Hardavellas , I. Pandis , R. Johnson , N. Mancheril , A. Ailamaki , and B. Falsafi . Database servers on chip multiprocessors: Limitations and opportunities . In CIDR , pages 79 -- 87 , 2007 . N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In CIDR, pages 79--87, 2007."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376670"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/7902.7903"},{"key":"e_1_2_1_20_1","first-page":"525","volume-title":"VLDB","author":"Hua K. A.","year":"1991","unstructured":"K. A. Hua and C. Lee . Handling data skew in multiprocessor database computers using partition tuning . In VLDB , pages 525 -- 535 , 1991 . K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. In VLDB, pages 525--535, 1991."},{"key":"e_1_2_1_21_1","unstructured":"Intel Advanced Vector Extensions Programming Reference. 2008 http:\/\/softwarecommunity.intel.com\/isn\/downloads\/intelavx\/Intel-AVX-Programming-Reference-31943302.pdf.  Intel Advanced Vector Extensions Programming Reference. 2008 http:\/\/softwarecommunity.intel.com\/isn\/downloads\/intelavx\/Intel-AVX-Programming-Reference-31943302.pdf."},{"key":"e_1_2_1_22_1","unstructured":"Intel SSE4 programming reference. 2007 http:\/\/www.intel.com\/design\/processor\/manuals\/253667.pdf.  Intel SSE4 programming reference. 2007 http:\/\/www.intel.com\/design\/processor\/manuals\/253667.pdf."},{"key":"e_1_2_1_23_1","volume-title":"Application of hash to data base machine and its architecture. New Generation Comput., 1(1)","author":"Kitsuregawa M.","year":"1983","unstructured":"M. Kitsuregawa , H. Tanaka , and T. Moto-Oka . Application of hash to data base machine and its architecture. New Generation Comput., 1(1) , 1983 . M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of hash to data base machine and its architecture. New Generation Comput., 1(1), 1983."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2008.38"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/645913.671312"},{"key":"e_1_2_1_26_1","first-page":"198","volume-title":"D. McLeod, R. Sacks-Davis, and H.-J","author":"Lu H.","year":"1990","unstructured":"H. Lu , K.-L. Tan , and M.-C. Shan . Hash-based join algorithms for multiprocessor computers . In D. McLeod, R. Sacks-Davis, and H.-J . Schek, editors, VLDB , pages 198 -- 209 , 1990 . H. Lu, K.-L. Tan, and M.-C. Shan. Hash-based join algorithms for multiprocessor computers. In D. McLeod, R. Sacks-Davis, and H.-J. Schek, editors, VLDB, pages 198--209, 1990."},{"key":"e_1_2_1_27_1","first-page":"339","volume-title":"VLDB","author":"Manegold S.","year":"2000","unstructured":"S. Manegold , P. A. Boncz , and M. L. Kersten . What happens during a join? dissecting cpu and memory optimization effects . In VLDB , pages 339 -- 350 , 2000 . S. Manegold, P. A. Boncz, and M. L. Kersten. What happens during a join? dissecting cpu and memory optimization effects. In VLDB, pages 339--350, 2000."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2002.1019210"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.86103"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/211390"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/191843.191884"},{"key":"e_1_2_1_32_1","first-page":"41","volume-title":"Graphics Hardware 2003","author":"Purcell T. J.","year":"2003","unstructured":"T. J. Purcell , C. Donner , M. Cammarano , H. W. Jensen , and P. Hanrahan . Photon Mapping on Programmable Graphics Hardware . In Graphics Hardware 2003 , pages 41 -- 50 , July 2003 . T. J. Purcell, C. Donner, M. Cammarano, H. W. Jensen, and P. Hanrahan. Photon Mapping on Programmable Graphics Hardware. In Graphics Hardware 2003, pages 41--50, July 2003."},{"key":"e_1_2_1_33_1","volume-title":"HPEC","author":"Reilly M.","year":"2008","unstructured":"M. Reilly . When multicore isn't enough: Trends and the future for multi-multicore systems . In HPEC , 2008 . M. Reilly. When multicore isn't enough: Trends and the future for multi-multicore systems. In HPEC, 2008."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1399504.1360617"},{"key":"e_1_2_1_35_1","first-page":"510","volume-title":"VLDB","author":"Shatdal A.","year":"1994","unstructured":"A. Shatdal , C. Kant , and J. F. Naughton . Cache conscious algorithms for relational query processing . In VLDB , pages 510 -- 521 , 1994 . A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. In VLDB, pages 510--521, 1994."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.205654"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.250117"},{"key":"e_1_2_1_38_1","first-page":"186","volume-title":"VLDB","author":"Zeller H.","year":"1990","unstructured":"H. Zeller and J. Gray . An adaptive hash join algorithm for multiuser environments . In VLDB , pages 186 -- 197 , 1990 . H. Zeller and J. Gray. An adaptive hash join algorithm for multiuser environments. In VLDB, pages 186--197, 1990."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564709"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687553.1687564","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:57:31Z","timestamp":1672225051000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687553.1687564"}},"subtitle":["fast join implementation on modern multi-core CPUs"],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687553.1687564"],"URL":"https:\/\/doi.org\/10.14778\/1687553.1687564","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}