{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T00:57:27Z","timestamp":1770339447047,"version":"3.49.0"},"reference-count":79,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2021,5,31]],"date-time":"2021-05-31T00:00:00Z","timestamp":1622419200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,5,31]],"date-time":"2021-05-31T00:00:00Z","timestamp":1622419200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["The VLDB Journal"],"published-print":{"date-parts":[[2021,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>With an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.<\/jats:p>","DOI":"10.1007\/s00778-021-00665-6","type":"journal-article","created":{"date-parts":[[2021,5,31]],"date-time":"2021-05-31T16:31:33Z","timestamp":1622478693000},"page":"859-881","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Formal semantics and high performance in declarative machine learning using Datalog"],"prefix":"10.1007","volume":"30","author":[{"given":"Jin","family":"Wang","sequence":"first","affiliation":[]},{"given":"Jiacheng","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Mingda","family":"Li","sequence":"additional","affiliation":[]},{"given":"Jiaqi","family":"Gu","sequence":"additional","affiliation":[]},{"given":"Ariyam","family":"Das","sequence":"additional","affiliation":[]},{"given":"Carlo","family":"Zaniolo","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,5,31]]},"reference":[{"key":"665_CR1","first-page":"34:1","volume":"17","author":"X Meng","year":"2016","unstructured":"Meng, X., Bradley, J.K., Yavuz, B., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17, 34:1\u201334:7 (2016)","journal-title":"J. Mach. Learn. Res."},{"key":"665_CR2","unstructured":"Apache Mahout . https:\/\/mahout.apache.org\/"},{"issue":"12","key":"665_CR3","doi-asserted-by":"publisher","first-page":"1700","DOI":"10.14778\/2367502.2367510","volume":"5","author":"JM Hellerstein","year":"2012","unstructured":"Hellerstein, J.M., R\u00e9, C., Schoppmann, F., Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., Welton, C., Feng, X., Li, K., Kumar, A.: The madlib analytics library or MAD skills, the SQL. Proc. VLDB Endow. 5(12), 1700\u20131711 (2012)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR4","doi-asserted-by":"crossref","unstructured":"Li, Y., Wang, J., Li, M., Das, A., Gu, J., Zaniolo, C.: Kddlog: Performance and scalability in knowledge discovery by declarative queries with aggregates. In: IEEE International Conference on Data Engineering (ICDE), (2021)","DOI":"10.1109\/ICDE51399.2021.00113"},{"issue":"9","key":"665_CR5","doi-asserted-by":"publisher","first-page":"975","DOI":"10.14778\/3213880.3213888","volume":"11","author":"L Bellomarini","year":"2018","unstructured":"Bellomarini, L., Sallinger, E., Gottlob, G.: The vadalog system: datalog-based reasoning for knowledge graphs. Proc. VLDB Endow. 11(9), 975\u2013987 (2018)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR6","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Acharya, A., Chen, H., Arora, S., Chen, A., Liu, V., Loo, B.T.: Optimizing declarative graph queries at large scale. In: ACM International Conference on Management of Data, SIGMOD Conference, 1411\u20131428, (2019)","DOI":"10.1145\/3299869.3300064"},{"issue":"14","key":"665_CR7","doi-asserted-by":"publisher","first-page":"1906","DOI":"10.14778\/2556549.2556572","volume":"6","author":"J Seo","year":"2013","unstructured":"Seo, J., Park, J., Shin, J., Lam, M.S.: Distributed socialite: a datalog-based language for large-scale graph analysis. Proc. VLDB Endow. 6(14), 1906\u20131917 (2013)","journal-title":"Proc. VLDB Endow."},{"issue":"4","key":"665_CR8","doi-asserted-by":"publisher","first-page":"471","DOI":"10.1007\/s00778-012-0299-1","volume":"22","author":"M Mazuran","year":"2013","unstructured":"Mazuran, M., Serra, E., Zaniolo, C.: Extending the power of datalog recursion. VLDB J. 22(4), 471\u2013493 (2013)","journal-title":"VLDB J."},{"issue":"5\u20136","key":"665_CR9","doi-asserted-by":"publisher","first-page":"1048","DOI":"10.1017\/S1471068417000436","volume":"17","author":"C Zaniolo","year":"2017","unstructured":"Zaniolo, C., Yang, M., Das, A., Shkapsky, A., Condie, T., Interlandi, M.: Fixpoint semantics and optimization of recursive datalog programs with aggregates. Theory Pract. Log. Program. 17(5\u20136), 1048\u20131065 (2017)","journal-title":"Theory Pract. Log. Program."},{"key":"665_CR10","unstructured":"Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A., Condie, T.: Declarative bigdata algorithms via aggregates and relational database dependencies. In: Alberto Mendelzon International Workshop on Foundations of Data Management, (2018)"},{"key":"665_CR11","doi-asserted-by":"crossref","unstructured":"Feng, X., Kumar, A., Recht, B., R\u00e9, C.: Towards a unified architecture for in-rdbms analytics. In: ACM International Conference on Management of Data, SIGMOD Conference, 325\u2013336, (2012)","DOI":"10.1145\/2213836.2213874"},{"key":"665_CR12","doi-asserted-by":"crossref","unstructured":"Luo, S., Gao, Z.J., Gubanov, M.N., Perez, L.L., Jermaine, C.M.: Scalable linear algebra on a relational database system. In: IEEE International Conference on Data Engineering (ICDE), 523\u2013534, (2017)","DOI":"10.1109\/ICDE.2017.108"},{"issue":"7","key":"665_CR13","doi-asserted-by":"publisher","first-page":"822","DOI":"10.14778\/3317315.3317323","volume":"12","author":"D Jankov","year":"2019","unstructured":"Jankov, D., Luo, S., Yuan, B., Cai, Z., Zou, J., Jermaine, C., Gao, Z.J.: Declarative recursive computation on an RDBMS. Proc. VLDB Endow. 12(7), 822\u2013835 (2019)","journal-title":"Proc. VLDB Endow."},{"issue":"12","key":"665_CR14","doi-asserted-by":"publisher","first-page":"1933","DOI":"10.14778\/3137765.3137812","volume":"10","author":"X Li","year":"2017","unstructured":"Li, X., Cui, B., Chen, Y., Wu, W., Zhang, C.: Mlog: towards declarative in-database machine learning. Proc. VLDB Endow. 10(12), 1933\u20131936 (2017)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR15","doi-asserted-by":"crossref","unstructured":"Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T., Zaniolo, C.: Big data analytics with datalog queries on spark. In: ACM International Conference on Management of Data, SIGMOD Conference, 1135\u20131149, (2016)","DOI":"10.1145\/2882903.2915229"},{"issue":"5\u20136","key":"665_CR16","doi-asserted-by":"publisher","first-page":"806","DOI":"10.1017\/S1471068418000418","volume":"18","author":"T Condie","year":"2018","unstructured":"Condie, T., Das, A., Interlandi, M., Shkapsky, A., Yang, M., Zaniolo, C.: Scaling-up reasoning and advanced analytics on bigdata. Theory Pract. Log. Program. 18(5\u20136), 806\u2013845 (2018)","journal-title":"Theory Pract. Log. Program."},{"key":"665_CR17","volume-title":"Foundations of Databases","author":"S Abiteboul","year":"1995","unstructured":"Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)"},{"key":"665_CR18","doi-asserted-by":"crossref","unstructured":"Shkapsky, A., Yang, M., Zaniolo, C.: Optimizing recursive queries with monotonic aggregates in deals. In: IEEE International Conference on Data Engineering (ICDE), 867\u2013878, (2015)","DOI":"10.1109\/ICDE.2015.7113340"},{"issue":"2","key":"665_CR19","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1007\/s00778-016-0448-z","volume":"26","author":"M Yang","year":"2017","unstructured":"Yang, M., Shkapsky, A., Zaniolo, C.: Scaling up the performance of more powerful datalog systems on multicore machines. VLDB J. 26(2), 229\u2013248 (2017)","journal-title":"VLDB J."},{"key":"665_CR20","doi-asserted-by":"crossref","unstructured":"Syed, U., Vassilvitskii, S.: SQML: large-scale in-database machine learning with pure SQL. In: ACM Symposium on Cloud Computing, (SoCC), 659, (2017)","DOI":"10.1145\/3127479.3132746"},{"key":"665_CR21","volume-title":"Advanced Database Systems","author":"C Zaniolo","year":"1997","unstructured":"Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R.T., Subrahmanian, V.S., Zicari, R.: Advanced Database Systems. Morgan Kaufmann, Massachusetts (1997)"},{"key":"665_CR22","unstructured":"LIBSVM Data . https:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvmtools\/datasets\/"},{"key":"665_CR23","doi-asserted-by":"crossref","unstructured":"Ganguly, S., Greco, S., Zaniolo, C.: Minimum and maximum predicates in logic programming. In: ACM Symposium on Principles of Database Systems (PODS), 154\u2013163, (1991)","DOI":"10.1145\/113413.113427"},{"key":"665_CR24","unstructured":"Sudarshan, S., Ramakrishnan, R.: Aggregation and relevance in deductive databases. In: International Conference on Very Large Data Bases (VLDB), 501\u2013511, (1991)"},{"key":"665_CR25","unstructured":"Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: International Conference on Logic Programming (ICLP), 1070\u20131080, (1988)"},{"key":"665_CR26","doi-asserted-by":"crossref","unstructured":"Gu, J., Watanabe, Y., Mazza, W., Shkapsky, A., Yang, M., Ding, L., Zaniolo, C.: Rasql: Greater power and performance for big data analytics with recursive-aggregate-sql on spark. In: ACM International Conference on Management of Data, SIGMOD Conference, 467\u2013484, (2019)","DOI":"10.1145\/3299869.3324959"},{"key":"665_CR27","doi-asserted-by":"crossref","unstructured":"Das, A., Li, Y., Wang, J., Li, M., Zaniolo, C.: Bigdata applications from graph analytics to machine learning by aggregates in recursion. In: International Conference on Logic Programming (ICLP), pages 273\u2013279, (2019)","DOI":"10.4204\/EPTCS.306.32"},{"issue":"5\u20136","key":"665_CR28","doi-asserted-by":"publisher","first-page":"1056","DOI":"10.1017\/S1471068419000358","volume":"19","author":"A Das","year":"2019","unstructured":"Das, A., Zaniolo, C.: A case for stale synchronous distributed model for declarative recursive computation. Theory Pract. Log. Program. 19(5\u20136), 1056\u20131072 (2019)","journal-title":"Theory Pract. Log. Program."},{"key":"665_CR29","unstructured":"Zaniolo, C., Das, A., Gu, J., Li, Y., Li, M., Wang, J.: Monotonic properties of completed aggregates in recursive queries. CoRR, arXiv:1910.08888, (2019)"},{"issue":"1","key":"665_CR30","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1017\/S1471068402001515","volume":"3","author":"F Arni","year":"2003","unstructured":"Arni, F., Ong, K., Tsur, S., Wang, H., Zaniolo, C.: The deductive database system LDL++. Theory Pract. Log. Program. 3(1), 61\u201394 (2003)","journal-title":"Theory Pract. Log. Program."},{"key":"665_CR31","unstructured":"Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI), 15\u201328, (2012)"},{"key":"665_CR32","doi-asserted-by":"crossref","unstructured":"Wolfson, O., Silberschatz, A.: Distributed processing of logic programs. In: ACM International Conference on Management of Data, SIGMOD Conference, 329\u2013336, (1988)","DOI":"10.1145\/971701.50242"},{"key":"665_CR33","doi-asserted-by":"crossref","unstructured":"Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious urls: an application of large-scale online learning. In: International Conference on Machine Learning (ICML), 681\u2013688, (2009)","DOI":"10.1145\/1553374.1553462"},{"key":"665_CR34","doi-asserted-by":"crossref","unstructured":"Juan, Y., Zhuang, Y., Chin, W., Lin, C.: Field-aware factorization machines for CTR prediction. In: ACM Conference on Recommender Systems (RecSys), 43\u201350, (2016)","DOI":"10.1145\/2959100.2959134"},{"key":"665_CR35","unstructured":"Webb, S., Caverlee, J., Pu, C.: Introducing the webb spam corpus: using email spam to identify web spam automatically. In: The Third Conference on Email and Anti-Spam (CEAS), (2006)"},{"issue":"13","key":"665_CR36","doi-asserted-by":"publisher","first-page":"1425","DOI":"10.14778\/3007263.3007279","volume":"9","author":"M Boehm","year":"2016","unstructured":"Boehm, M., Dusenberry, M., Eriksson, D., Evfimievski, A.V., Manshadi, F.M., Pansare, N., Reinwald, B., Reiss, F., Sen, P., Surve, A., Tatikonda, S.: SystemML: declarative machine learning on spark. Proc. VLDB Endow. 9(13), 1425\u20131436 (2016)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR37","doi-asserted-by":"crossref","unstructured":"Makrynioti, N., Vasiloglou, N., Pasalic, E., Vassalos, V.: Modelling machine learning algorithms on relational data with datalog. In: DEEM@ACM International Conference on Management of Data, SIGMOD Conference, 5:1\u20135:4, (2018)","DOI":"10.1145\/3209889.3209893"},{"key":"665_CR38","unstructured":"Bu, Y., Borkar, V.R., Carey, M.J., Rosen, J., Polyzotis, N., Condie, T., Weimer, M., Ramakrishnan, R.: Scaling datalog for machine learning on big data. CoRR, arXiv:1203.0160, (2012)"},{"issue":"13","key":"665_CR39","doi-asserted-by":"publisher","first-page":"2168","DOI":"10.14778\/3275366.3284963","volume":"11","author":"A Thomas","year":"2018","unstructured":"Thomas, A., Kumar, A.: A comparative evaluation of systems for scalable linear algebra-based analytics. Proc. VLDB Endow. 11(13), 2168\u20132182 (2018)","journal-title":"Proc. VLDB Endow."},{"issue":"2","key":"665_CR40","first-page":"24","volume":"35","author":"VR Borkar","year":"2012","unstructured":"Borkar, V.R., Bu, Y., Carey, M.J., Rosen, J., Polyzotis, N., Condie, T., Weimer, M., Ramakrishnan, R.: Declarative systems for large-scale machine learning. IEEE Data Eng. Bull. 35(2), 24\u201332 (2012)","journal-title":"IEEE Data Eng. Bull."},{"key":"665_CR41","unstructured":"Mumick, I.S, Pirahesh, H., Ramakrishnan, R.: The magic of duplicates and aggregates. In: International Conference on Very Large Data Bases (VLDB), 264\u2013277, (1990)"},{"issue":"5","key":"665_CR42","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1016\/S0306-4379(02)00006-6","volume":"27","author":"F Furfaro","year":"2002","unstructured":"Furfaro, F., Greco, S., Ganguly, S., Zaniolo, C.: Pushing extrema aggregates to optimize logic queries. Inf. Syst. 27(5), 321\u2013343 (2002)","journal-title":"Inf. Syst."},{"key":"665_CR43","doi-asserted-by":"crossref","unstructured":"Ross, K.A., Sagiv, Y.: Monotonic aggregation in deductive databases. In: ACM Symposium on Principles of Database Systems (PODS), 114\u2013126, (1992)","DOI":"10.1145\/137097.137852"},{"issue":"1","key":"665_CR44","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1006\/jcss.1997.1453","volume":"54","author":"KA Ross","year":"1997","unstructured":"Ross, K.A., Sagiv, Y.: Monotonic aggregation in deductive database. J. Comput. Syst. Sci. 54(1), 79\u201397 (1997)","journal-title":"J. Comput. Syst. Sci."},{"issue":"2","key":"665_CR45","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1006\/jcss.1995.1064","volume":"51","author":"S Ganguly","year":"1995","unstructured":"Ganguly, S., Greco, S., Zaniolo, C.: Extrema predicates in deductive databases. J. Comput. Syst. Sci. 51(2), 244\u2013259 (1995)","journal-title":"J. Comput. Syst. Sci."},{"key":"665_CR46","doi-asserted-by":"crossref","unstructured":"Wang, J., Xiao, G., Gu, J., Wu, J., Zaniolo, C.: RASQL: A powerful language and its system for big data applications. In: ACM International Conference on Management of Data, SIGMOD Conference, 2673\u20132676, (2020)","DOI":"10.1145\/3318464.3384677"},{"key":"665_CR47","doi-asserted-by":"crossref","unstructured":"Seib, J., Lausen, G.: Parallelizing datalog programs by generalized pivoting. In: ACM Symposium on Principles of Database Systems (PODS), 241\u2013251, (1991)","DOI":"10.1145\/113413.113435"},{"key":"665_CR48","doi-asserted-by":"crossref","unstructured":"Shaw, M., Koutris, P., Howe, B., Suciu, D.: Optimizing large-scale semi-na\u00efve datalog evaluation in hadoop. Datalog Acad. Ind. 165\u2013176, (2012)","DOI":"10.1007\/978-3-642-32925-8_17"},{"key":"665_CR49","doi-asserted-by":"crossref","unstructured":"Loo, B.T., Condie, T., Hellerstein, J.M., Maniatis, P., Roscoe, T., Stoica, I.: Implementing declarative overlays. In: ACM Symposium on Operating Systems Principles (SOSP), 75\u201390, (2005)","DOI":"10.1145\/1095809.1095818"},{"key":"665_CR50","doi-asserted-by":"crossref","unstructured":"Loo, B.T., Condie, T., Garofalakis, M.N., Gay, D.E., Hellerstein, J.M., Maniatis, P., Ramakrishnan, R., Roscoe, T., Stoica, I.: Declarative networking: language, execution and optimization. In: ACM International Conference on Management of Data, SIGMOD Conference, 97\u2013108, (2006)","DOI":"10.1145\/1142473.1142485"},{"key":"665_CR51","unstructured":"Seo, J., Guo, S., Lam, M.\u00a0S.: Socialite: Datalog extensions for efficient social network analysis. In: IEEE International Conference on Data Engineering (ICDE), 278\u2013289, (2013)"},{"key":"665_CR52","doi-asserted-by":"crossref","unstructured":"Aref, M., Cate, B.\u00a0ten, Green, T.\u00a0J., Kimelfeld, B., Olteanu, D., Pasalic, E., Veldhuizen, T.\u00a0L., Washburn, G.: Design and implementation of the logicblox system. In: ACM International Conference on Management of Data, SIGMOD Conference, 1371\u20131382, (2015)","DOI":"10.1145\/2723372.2742796"},{"issue":"12","key":"665_CR53","doi-asserted-by":"publisher","first-page":"1542","DOI":"10.14778\/2824032.2824052","volume":"8","author":"J Wang","year":"2015","unstructured":"Wang, J., Balazinska, M., Halperin, D.: Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. Proc. VLDB Endow. 8(12), 1542\u20131553 (2015)","journal-title":"Proc. VLDB Endow."},{"issue":"12","key":"665_CR54","doi-asserted-by":"publisher","first-page":"960","DOI":"10.14778\/2994509.2994515","volume":"9","author":"A Elgohary","year":"2016","unstructured":"Elgohary, A., Boehm, M., Haas, P.J., Reiss, F.R., Reinwald, B.: Compressed linear algebra for large-scale machine learning. Proc. VLDB Endow. 9(12), 960\u2013971 (2016)","journal-title":"Proc. VLDB Endow."},{"issue":"5","key":"665_CR55","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1007\/s00778-017-0478-1","volume":"27","author":"A Elgohary","year":"2018","unstructured":"Elgohary, A., Boehm, M., Haas, P.J., Reiss, F.R., Reinwald, B.: Compressed linear algebra for large-scale machine learning. VLDB J. 27(5), 719\u2013744 (2018)","journal-title":"VLDB J."},{"issue":"11","key":"665_CR56","doi-asserted-by":"publisher","first-page":"1214","DOI":"10.14778\/3137628.3137633","volume":"10","author":"L Chen","year":"2017","unstructured":"Chen, L., Kumar, A., Naughton, J.F., Patel, J.M.: Towards linear algebra over normalized data. Proc. VLDB Endow. 10(11), 1214\u20131225 (2017)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR57","unstructured":"Elgamal, T., Luo, S., Boehm, M., Evfimievski, A.V., Tatikonda, S., Reinwald, B., Sen, P.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In Innov. Data Syst. Res. (CIDR), (2017)"},{"key":"665_CR58","doi-asserted-by":"crossref","unstructured":"Schleich, M., Olteanu, D., Ciucanu, R.: Learning linear regression models over factorized joins. In: ACM International Conference on Management of Data, SIGMOD Conference, 3\u201318, (2016)","DOI":"10.1145\/2882903.2882939"},{"key":"665_CR59","doi-asserted-by":"crossref","unstructured":"Cai, Z., Vagena, Z., Perez, L.L., Arumugam, S., Haas, P.J., Jermaine, C.M.: Simulation of database-valued markov chains using simsql. In: ACM International Conference on Management of Data, SIGMOD Conference, 637\u2013648, (2013)","DOI":"10.1145\/2463676.2465283"},{"issue":"7","key":"665_CR60","doi-asserted-by":"publisher","first-page":"1224","DOI":"10.1109\/TKDE.2018.2827988","volume":"31","author":"S Luo","year":"2019","unstructured":"Luo, S., Gao, Z.J., Gubanov, M.N., Perez, L.L., Jermaine, C.M.: Scalable linear algebra on a relational database system. IEEE Trans. Knowl. Data Eng. 31(7), 1224\u20131238 (2019)","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"665_CR61","doi-asserted-by":"crossref","unstructured":"Gao, Z.J., Luo, S., Perez, L.L., Jermaine, C.: The BUDS language for distributed bayesian machine learning. In: ACM International Conference on Management of Data, SIGMOD Conference, 961\u2013976, (2017)","DOI":"10.1145\/3035918.3035937"},{"key":"665_CR62","unstructured":"Kraska, T., Talwalkar, A., Duchi, J.C., Griffith, R., Franklin, M.J., Jordan, M.I.: Mlbase: A distributed machine-learning system. Innov. Data Syst. Res. (CIDR)"},{"issue":"8","key":"665_CR63","doi-asserted-by":"publisher","first-page":"901","DOI":"10.14778\/3090163.3090168","volume":"10","author":"MJ Anderson","year":"2017","unstructured":"Anderson, M.J., Smith, S., Sundaram, N., Capota, M., Zhao, Z., Dulloor, S., Satish, N., Willke, T.L.: Bridging the gap between HPC and big data frameworks. Proc. VLDB Endow. 10(8), 901\u2013912 (2017)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR64","doi-asserted-by":"crossref","unstructured":"Sparks, E.R., Venkataraman, S., Kaftan, T., Franklin, M.J., Recht, B.: Keystoneml: Optimizing pipelines for large-scale advanced analytics. In: IEEE International Conference on Data Engineering (ICDE), 535\u2013546, (2017)","DOI":"10.1109\/ICDE.2017.109"},{"issue":"4","key":"665_CR65","doi-asserted-by":"publisher","first-page":"446","DOI":"10.14778\/3297753.3297763","volume":"12","author":"D Xin","year":"2018","unstructured":"Xin, D., Macke, S., Ma, L., Liu, J., Song, S., Parameswaran, A.G.: Helix: holistic optimization for accelerating iterative machine learning. Proc. VLDB Endow. 12(4), 446\u2013460 (2018)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR66","doi-asserted-by":"crossref","unstructured":"Kaoudi, Z., Quian\u00e9-Ruiz, J., Thirumuruganathan, S., Chawla, S., Agrawal, D.: A cost-based optimizer for gradient descent optimization. In: ACM International Conference on Management of Data, SIGMOD Conference, 977\u2013992, (2017)","DOI":"10.1145\/3035918.3064042"},{"key":"665_CR67","unstructured":"Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, \u00da., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1\u201314, (2008)"},{"issue":"11","key":"665_CR68","doi-asserted-by":"publisher","first-page":"1280","DOI":"10.14778\/2350229.2350246","volume":"5","author":"SR Mihaylov","year":"2012","unstructured":"Mihaylov, S.R., Ives, Z.G., Guha, S.: REX: recursive, delta-based data-centric computation. Proc. VLDB Endow. 5(11), 1280\u20131291 (2012)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR69","unstructured":"McSherry, F., Murray, D.G, Isaacs, R., Isard, M.: Differential dataflow. Innov. Data Syst. Res. (CIDR)"},{"issue":"8","key":"665_CR70","doi-asserted-by":"publisher","first-page":"716","DOI":"10.14778\/2212351.2212354","volume":"5","author":"Y Low","year":"2012","unstructured":"Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. Proc. VLDB Endow. 5(8), 716\u2013727 (2012)","journal-title":"Proc. VLDB Endow."},{"key":"665_CR71","doi-asserted-by":"crossref","unstructured":"Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: ACM International Conference on Management of Data, SIGMOD Conference, 135\u2013146, (2010)","DOI":"10.1145\/1807167.1807184"},{"key":"665_CR72","unstructured":"Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI), 599\u2013613, (2014)"},{"key":"665_CR73","unstructured":"Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josifovski, V., Long, J., Shekita, E.J., Su, B.: Scaling distributed machine learning with the parameter server. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI), 583\u2013598, (2014)"},{"key":"665_CR74","unstructured":"Steiner, B., DeVito, Z., Chintala, S., et\u00a0al.: Pytorch: An imperative style, high-performance deep learning library. In: Annual Conference on Neural Information Processing Systems (NeurIPS), (2019)"},{"key":"665_CR75","unstructured":"Abadi, M., Barham, P., Chen, J., et\u00a0al.: Tensorflow: a system for large-scale machine learning. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI), 265\u2013283, (2016)"},{"key":"665_CR76","doi-asserted-by":"crossref","unstructured":"Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Yu, Y.: Petuum: A new platform for distributed machine learning on big data. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1335\u20131344, (2015)","DOI":"10.1145\/2783258.2783323"},{"key":"665_CR77","unstructured":"Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang,Z.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, arXiv:1512.01274, (2015)"},{"key":"665_CR78","doi-asserted-by":"crossref","unstructured":"Schleich, M., Olteanu, D., Khamis, M.A., Ngo, H.Q., Nguyen, X.: A layered aggregate engine for analytics workloads. In: ACM International Conference on Management of Data, SIGMOD Conference, 1642\u20131659, (2019)","DOI":"10.1145\/3299869.3324961"},{"key":"665_CR79","unstructured":"Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M.I., Stoica, I.: Ray: a distributed framework for emerging AI applications. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI), 561\u2013577, (2018)"}],"container-title":["The VLDB Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-021-00665-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00778-021-00665-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-021-00665-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,10]],"date-time":"2021-08-10T10:32:34Z","timestamp":1628591554000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00778-021-00665-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,31]]},"references-count":79,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,9]]}},"alternative-id":["665"],"URL":"https:\/\/doi.org\/10.1007\/s00778-021-00665-6","relation":{},"ISSN":["1066-8888","0949-877X"],"issn-type":[{"value":"1066-8888","type":"print"},{"value":"0949-877X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,31]]},"assertion":[{"value":"26 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 May 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}