{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,17]],"date-time":"2024-08-17T13:09:05Z","timestamp":1723900145210},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,3]]},"abstract":"<jats:p>When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.<\/jats:p>","DOI":"10.14778\/3450980.3450991","type":"journal-article","created":{"date-parts":[[2021,4,12]],"date-time":"2021-04-12T16:17:16Z","timestamp":1618244236000},"page":"1228-1240","source":"Crossref","is-referenced-by-count":6,"title":["Distributed numerical and machine learning computations via two-phase execution of aggregated join trees"],"prefix":"10.14778","volume":"14","author":[{"given":"Dimitrije","family":"Jankov","sequence":"first","affiliation":[{"name":"Rice University"}]},{"given":"Binhang","family":"Yuan","sequence":"additional","affiliation":[{"name":"Rice University"}]},{"given":"Shangyu","family":"Luo","sequence":"additional","affiliation":[{"name":"Rice University"}]},{"given":"Chris","family":"Jermaine","sequence":"additional","affiliation":[{"name":"Rice University"}]}],"member":"320","published-online":{"date-parts":[[2021,4,12]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. Intel\u00ae Math Kernel Library. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/tools\/math-kernel-library.html. Accessed: 2021-03-10.  [n.d.]. Intel\u00ae Math Kernel Library. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/tools\/math-kernel-library.html. Accessed: 2021-03-10."},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. MXNet. https:\/\/mxnet.apache.org\/.  [n.d.]. MXNet. https:\/\/mxnet.apache.org\/."},{"key":"e_1_2_1_3_1","unstructured":"[n.d.]. Pytorch. https:\/\/pytorch.org\/. Accessed: 2021-03-10.  [n.d.]. Pytorch. https:\/\/pytorch.org\/. Accessed: 2021-03-10."},{"key":"e_1_2_1_4_1","unstructured":"[n.d.]. Pytorch Einstein Notation. https:\/\/pytorch.org\/docs\/master\/generated\/torch.einsum.html. Accessed: 2021-03-10.  [n.d.]. Pytorch Einstein Notation. https:\/\/pytorch.org\/docs\/master\/generated\/torch.einsum.html. Accessed: 2021-03-10."},{"key":"e_1_2_1_5_1","unstructured":"[n.d.]. Ring-allreduce. https:\/\/andrew.gibiansky.com\/blog\/machine-learning\/baidu-allreduce\/. Accessed: 2021-03-10.  [n.d.]. Ring-allreduce. https:\/\/andrew.gibiansky.com\/blog\/machine-learning\/baidu-allreduce\/. Accessed: 2021-03-10."},{"key":"e_1_2_1_6_1","unstructured":"[n.d.]. ScaLAPACK. http:\/\/www.netlib.org\/scalapack\/. Accessed: 2021-03-10.  [n.d.]. ScaLAPACK. http:\/\/www.netlib.org\/scalapack\/. Accessed: 2021-03-10."},{"key":"e_1_2_1_7_1","unstructured":"[n.d.]. scikit. https:\/\/scikit-learn.org\/stable\/. Accessed: 2021-03-10.  [n.d.]. scikit. https:\/\/scikit-learn.org\/stable\/. Accessed: 2021-03-10."},{"key":"e_1_2_1_8_1","unstructured":"[n.d.]. Tensorflow. https:\/\/www.tensorflow.org\/. Accessed: 2021-03-10.  [n.d.]. Tensorflow. https:\/\/www.tensorflow.org\/. Accessed: 2021-03-10."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196959.3196960"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2015836.2015849"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.395.0575"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1182635.1164231"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/319628.319650"},{"key":"e_1_2_1_14_1","volume-title":"Jan Van den Bussche, and Timmy Weerwag","author":"Brijder Robert","year":"2017"},{"key":"e_1_2_1_15_1","unstructured":"Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. arXiv (2020).  Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. arXiv (2020)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/275487.275492"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676370"},{"key":"e_1_2_1_18_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv","author":"Devlin Jacob","year":"2018"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/509252.509292"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-0000(03)00026-6"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767930"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/1083592.1083650"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304208"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319865"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367510"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1979.234179"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/141484.130292"},{"key":"e_1_2_1_28_1","volume-title":"Gpipe: Efficient training of giant neural networks using pipeline parallelism. arXiv","author":"Huang Yanping","year":"2019"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3317315.3317323"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380575"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610531"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209889.3209896"},{"key":"e_1_2_1_33_1","volume-title":"Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv","author":"Lepikhin Dmitry","year":"2020"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685095"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137812"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3291264.3291273"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3277006.3277013"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783381"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767755"},{"key":"e_1_2_1_40_1","volume-title":"Memory-Efficient Pipeline-Parallel DNN Training. arXiv","author":"Narayanan Deepak","year":"2020"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610521"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/3199517.3199522"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/582095.582099"},{"key":"e_1_2_1_44_1","volume-title":"arXiv","author":"Vaswani Ashish","year":"2017"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/PARBSE.1990.77227"},{"key":"e_1_2_1_46_1","volume-title":"PipeMare: Asynchronous Pipeline Parallel DNN Training. arXiv","author":"Yang Bowen","year":"2019"},{"key":"e_1_2_1_47_1","volume-title":"Tensor Relational Algebra for Machine Learning System Design. arXiv","author":"Yuan Binhang","year":"2020"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196933"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3450980.3450991","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:14:27Z","timestamp":1672222467000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3450980.3450991"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3]]},"references-count":48,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2021,3]]}},"alternative-id":["10.14778\/3450980.3450991"],"URL":"https:\/\/doi.org\/10.14778\/3450980.3450991","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2021,3]]}}}