{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T05:11:40Z","timestamp":1770959500263,"version":"3.50.1"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2018,9,30]],"date-time":"2018-09-30T00:00:00Z","timestamp":1538265600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-0915956, IIS-1422488"],"award-info":[{"award-number":["IIS-0915956, IIS-1422488"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Oracle Corp."}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2018,9,30]]},"abstract":"<jats:p>Network communication is the slowest component of many operators in distributed parallel databases deployed for large-scale analytics. Whereas considerable work has focused on speeding up databases on modern hardware, communication reduction has received less attention. Existing parallel DBMSs rely on algorithms designed for disks with minor modifications for networks. A more complicated algorithm may burden the CPUs but could avoid redundant transfers of tuples across the network. We introduce track join, a new distributed join algorithm that minimizes network traffic by generating an optimal transfer schedule for each distinct join key. Track join extends the trade-off options between CPU and network. Track join explicitly detects and exploits locality, also allowing for advanced placement of tuples beyond hash partitioning on a single attribute. We propose a novel data placement algorithm based on track join that minimizes the total network cost of multiple joins across different dimensions in an analytical workload. Our evaluation shows that track join outperforms hash join on the most expensive queries of real workloads regarding both network traffic and execution time. Finally, we show that our data placement optimization approach is both robust and effective in minimizing the total network cost of joins in analytical workloads.<\/jats:p>","DOI":"10.1145\/3241039","type":"journal-article","created":{"date-parts":[[2018,11,16]],"date-time":"2018-11-16T13:08:54Z","timestamp":1542373734000},"page":"1-45","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Distributed Joins and Data Placement for Minimal Network Traffic"],"prefix":"10.1145","volume":"43","author":[{"given":"Orestis","family":"Polychroniou","sequence":"first","affiliation":[{"name":"Columbia University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wangda","family":"Zhang","sequence":"additional","affiliation":[{"name":"Columbia University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kenneth A.","family":"Ross","sequence":"additional","affiliation":[{"name":"Columbia University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,11,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1739041.1739056"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336678"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732219.2732227"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/322234.322238"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/319628.319650"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_7_1","unstructured":"Surajit Chaudhuri and Vivek Narasayya. 2016. TPC-H Data Generation with Skew. Retrieved from https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id&equals;52430.  Surajit Chaudhuri and Vivek Narasayya. 2016. TPC-H Data Generation with Skew. Retrieved from https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id&equals;52430."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750545"},{"key":"e_1_2_1_9_1","unstructured":"Transaction Processing Performance Council. 2014. The TPC-H Benchmark 2.17.1. Retrieved from http:\/\/www.tpc.org\/tpch.  Transaction Processing Performance Council. 2014. The TPC-H Benchmark 2.17.1. Retrieved from http:\/\/www.tpc.org\/tpch."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920853"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the OSDI. 137--150","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat . 2004 . MapReduce: Simplified data processing on large clusters . In Proceedings of the OSDI. 137--150 . Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the OSDI. 137--150."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.50905"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/129888.129894"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/602259.602261"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723709"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376727"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732279.2732281"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002943"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/509252.509292"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1565694.1565701"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2618243.2618258"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/1921071.1921077"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1516360.1516365"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687564"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213965"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF03037022"},{"key":"e_1_2_1_27_1","volume-title":"Data placement and replica selection for improving co-location in distributed environments. CoRR abs\/1302.4168","author":"Kumar K. Ashwin","year":"2013","unstructured":"K. Ashwin Kumar , Amol Deshpande , and Samir Khuller . 2013. Data placement and replica selection for improving co-location in distributed environments. CoRR abs\/1302.4168 ( 2013 ). Retrieved from http:\/\/arxiv.org\/abs\/1302.4168. K. Ashwin Kumar, Amol Deshpande, and Samir Khuller. 2013. Data placement and replica selection for improving co-location in distributed environments. CoRR abs\/1302.4168 (2013). Retrieved from http:\/\/arxiv.org\/abs\/1302.4168."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610507"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2203"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/221270.221360"},{"key":"e_1_2_1_31_1","volume-title":"Lohman","author":"Mackert Lothar F.","year":"1986","unstructured":"Lothar F. Mackert and Guy M . Lohman . 1986 . R<sup>*<\/sup> optimizer validation and performance evaluation for distributed queries. In Proceedings of the VLDB. 149--159. Lothar F. Mackert and Guy M. Lohman. 1986. R<sup>*<\/sup> optimizer validation and performance evaluation for distributed queries. In Proceedings of the VLDB. 149--159."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780000031"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780050033"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2747644"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/32.52778"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989444"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989423"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213844"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559865"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007328.3007336"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2747645"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610522"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2771937.2771943"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610521"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452427"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536233"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564757"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498324"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816684"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.109109"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807207"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/67544.66937"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.250116"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the VLDB. 273--284","author":"Thomas","unstructured":"Thomas St\u00f6hr et al. 2000. Multi-dimensional database allocation for parallel data warehouses . In Proceedings of the VLDB. 273--284 . Thomas St\u00f6hr et al. 2000. Multi-dimensional database allocation for parallel data warehouses. In Proceedings of the VLDB. 273--284."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the VLDB. 553--564","author":"Mike","unstructured":"Mike Stonebraker et al. 2005. C-store: A column-oriented DBMS . In Proceedings of the VLDB. 553--564 . Mike Stonebraker et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the VLDB. 553--564."},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the VLDB. 1150--1160","author":"Michael","unstructured":"Michael Stonebraker et al. 2007. The end of an architectural era: (It\u2019s time for a complete rewrite) . In Proceedings of the VLDB. 1150--1160 . Michael Stonebraker et al. 2007. The end of an architectural era: (It\u2019s time for a complete rewrite). In Proceedings of the VLDB. 1150--1160."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735508.2735514"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2972950"},{"key":"e_1_2_1_59_1","volume-title":"Franklin","author":"Urhan Tolga","year":"2000","unstructured":"Tolga Urhan and Michael J . Franklin . 2000 . XJoin: A reactively scheduled pipelined join operator. IEEE Data Engin. Bulletin 23 (June 2000), 27--33. Tolga Urhan and Michael J. Franklin. 2000. XJoin: A reactively scheduled pipelined join operator. IEEE Data Engin. Bulletin 23 (June 2000), 27--33."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/2033408.2033427"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687671"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465288"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1995441.1995442"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463701"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723718"}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3241039","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3241039","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3241039","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:43:46Z","timestamp":1750207426000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3241039"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,30]]},"references-count":65,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,9,30]]}},"alternative-id":["10.1145\/3241039"],"URL":"https:\/\/doi.org\/10.1145\/3241039","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"value":"0362-5915","type":"print"},{"value":"1557-4644","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,9,30]]},"assertion":[{"value":"2017-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-11-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}