{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T05:36:04Z","timestamp":1768109764274,"version":"3.49.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2017,8]]},"abstract":"<jats:p>\n            Providing machine learning (ML) over relational data is a mainstream requirement for data analytics systems. While almost all ML tools require the input data to be presented as a single table, many datasets are multi-table. This forces data scientists to join those tables first, which often leads to data redundancy and runtime waste. Recent works on \"factorized\" ML mitigate this issue for a few specific ML algorithms by pushing ML through joins. But their approaches require a\n            <jats:italic>manual<\/jats:italic>\n            rewrite of ML implementations. Such piecemeal methods create a massive development overhead when extending such ideas to other ML algorithms. In this paper, we show that it is possible to mitigate this overhead by leveraging a popular formal algebra to represent the computations of many ML algorithms: linear algebra. We introduce a new logical data type to represent normalized data and devise a framework of algebraic rewrite rules to convert a large set of linear algebra operations over denormalized data into operations over normalized data. We show how this enables us to\n            <jats:italic>automatically<\/jats:italic>\n            \"factorize\" several popular ML algorithms, thus unifying and generalizing several prior works. We prototype our framework in the popular ML environment R and an industrial R-over-RDBMS tool. Experiments with both synthetic and real normalized data show that our framework also yields significant speed-ups, up to 36x on real data.\n          <\/jats:p>","DOI":"10.14778\/3137628.3137633","type":"journal-article","created":{"date-parts":[[2017,9,7]],"date-time":"2017-09-07T13:35:53Z","timestamp":1504791353000},"page":"1214-1225","source":"Crossref","is-referenced-by-count":52,"title":["Towards linear algebra over normalized data"],"prefix":"10.14778","volume":"10","author":[{"given":"Lingjiao","family":"Chen","sequence":"first","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Arun","family":"Kumar","sequence":"additional","affiliation":[{"name":"University of California"}]},{"given":"Jeffrey","family":"Naughton","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Jignesh M.","family":"Patel","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]}],"member":"320","published-online":{"date-parts":[[2017,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Matlab mmtimes function. mathworks.com\/matlabcentral\/fileexchange\/27950-mmtimes-matrix-chain-product.  Matlab mmtimes function. mathworks.com\/matlabcentral\/fileexchange\/27950-mmtimes-matrix-chain-product."},{"key":"e_1_2_1_2_1","unstructured":"Oracle R Enterprise.  Oracle R Enterprise."},{"key":"e_1_2_1_3_1","unstructured":"R. r-project.org.  R. r-project.org."},{"key":"e_1_2_1_4_1","unstructured":"SparkR. spark.apache.org\/R.  SparkR. spark.apache.org\/R."},{"key":"e_1_2_1_5_1","volume-title":"OSDI","author":"Abadi M.","year":"2016"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","DOI":"10.1137\/1.9780898719604","volume-title":"LAPACK Users' Guide","author":"Anderson E.","year":"1999"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350242"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732286.2732292"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007279"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465283"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593680"},{"key":"e_1_2_1_12_1","volume-title":"VLDB","author":"Chaudhuri S.","year":"1994"},{"key":"e_1_2_1_13_1","unstructured":"L. Chen etal Towards Linear Algebra over Normalized Data. https:\/\/arxiv.org\/abs\/1612.07448.  L. Chen et al. Towards Linear Algebra over Normalized Data. https:\/\/arxiv.org\/abs\/1612.07448."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687584"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/42288.42291"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","DOI":"10.1137\/1.9780898718867","volume-title":"Matrix Methods in Data Mining and Pattern Recognition","author":"Eld\u00e9n L.","year":"2007"},{"key":"e_1_2_1_17_1","volume-title":"CIDR","author":"Elgamal T.","year":"2017"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994515"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213874"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367510"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139020411","volume-title":"Matrix Analysis","author":"Horn R. A.","year":"2012","edition":"2"},{"issue":"2","key":"e_1_2_1_22_1","first-page":"362","volume":"11","author":"Hu T. C.","year":"1982","journal-title":"Computation of Matrix Chain Products. Part I. SIAM J. Comput."},{"key":"e_1_2_1_23_1","volume-title":"CIDR","author":"Kraska T.","year":"2013"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824087"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723713"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2935694.2935698"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882952"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2926534.2926540"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/355841.355847"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610519"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007312"},{"key":"e_1_2_1_32_1","volume-title":"Database Management Systems","author":"Ramakrishnan R.","year":"2003"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535573.2488340"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882939"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/42201.42203"},{"key":"e_1_2_1_36_1","volume-title":"VLDB","author":"Yan W. P.","year":"1995"},{"key":"e_1_2_1_37_1","volume-title":"ICDE","author":"Zhang Y.","year":"2010"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3137628.3137633","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:57:13Z","timestamp":1672221433000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3137628.3137633"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,8]]},"references-count":37,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2017,8]]}},"alternative-id":["10.14778\/3137628.3137633"],"URL":"https:\/\/doi.org\/10.14778\/3137628.3137633","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2017,8]]}}}