{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T02:25:44Z","timestamp":1773887144451,"version":"3.50.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2016,8]]},"abstract":"<jats:p>Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I\/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Hence, we initiate work on compressed linear algebra (CLA), in which lightweight database compression techniques are applied to matrices and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representations. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios that allow us to fit larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to 26x or reduced memory requirements.<\/jats:p>","DOI":"10.14778\/2994509.2994515","type":"journal-article","created":{"date-parts":[[2016,9,6]],"date-time":"2016-09-06T15:27:03Z","timestamp":1473175623000},"page":"960-971","source":"Crossref","is-referenced-by-count":48,"title":["Compressed linear algebra for large-scale machine learning"],"prefix":"10.14778","volume":"9","author":[{"given":"Ahmed","family":"Elgohary","sequence":"first","affiliation":[{"name":"University of Maryland, College Park, MD"}]},{"given":"Matthias","family":"Boehm","sequence":"additional","affiliation":[{"name":"IBM Research - Almaden, San Jose, CA"}]},{"given":"Peter J.","family":"Haas","sequence":"additional","affiliation":[{"name":"IBM Research - Almaden, San Jose, CA"}]},{"given":"Frederick R.","family":"Reiss","sequence":"additional","affiliation":[{"name":"IBM Research - Almaden, San Jose, CA"}]},{"given":"Berthold","family":"Reinwald","sequence":"additional","affiliation":[{"name":"IBM Research - Almaden, San Jose, CA"}]}],"member":"320","published-online":{"date-parts":[[2016,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"CoRR","author":"Abadi M.","year":"2016","unstructured":"M. Abadi : Large-Scale Machine Learning on Heterogeneous Distributed Systems . CoRR , 2016 . M. Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR, 2016."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0357-y"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2597652.2597678"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688521"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1985.231852"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-92bf1922-003"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247504"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687573"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2618243.2618268"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559877"},{"key":"e_1_2_1_12_1","volume-title":"CoRR","author":"Boehm M.","year":"2016","unstructured":"M. Boehm Declarative Machine Learning -- A Classification of Basic Properties and Types . CoRR , 2016 . M. Boehm et al. Declarative Machine Learning -- A Classification of Basic Properties and Types. CoRR, 2016."},{"key":"e_1_2_1_13_1","unstructured":"L. Bottou. The infinite MNIST dataset. http:\/\/leon.bottou.org\/projects\/infimnist.  L. Bottou. The infinite MNIST dataset. http:\/\/leon.bottou.org\/projects\/infimnist."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/335168.335230"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020558"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687576"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCP.2011.41"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/214956.214963"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807275"},{"key":"e_1_2_1_20_1","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified Data Processing on Large Clusters . In OSDI , 2004 . J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767930"},{"key":"e_1_2_1_22_1","volume-title":"Biometrika","author":"Good I. J.","year":"1953","unstructured":"I. J. Good . The Population Frequencies of Species and the Estimation of Population Parameters . Biometrika , 1953 . I. J. Good. The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika, 1953."},{"key":"e_1_2_1_23_1","volume-title":"Applied Computing","author":"Graefe G.","year":"1991","unstructured":"G. Graefe and L. D. Shapiro . Data Compression and Database Performance . In Applied Computing , 1991 . G. Graefe and L. D. Shapiro. Data Compression and Database Performance. In Applied Computing, 1991."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1998.10473807"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2012.6232381"},{"key":"e_1_2_1_26_1","volume-title":"FAST","author":"Harnik D.","year":"2013","unstructured":"D. Harnik To Zip or not to Zip: Effective Resource Usage for Real-Time Compression . In FAST , 2013 . D. Harnik et al. To Zip or not to Zip: Effective Resource Usage for Real-Time Compression. In FAST, 2013."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465273"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749432"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447871"},{"key":"e_1_2_1_30_1","volume-title":"Univariate Discrete Distributions","author":"Johnson N. L.","year":"1992","unstructured":"N. L. Johnson Univariate Discrete Distributions . Wiley , New York , 2 nd edition, 1992 . N. L. Johnson et al. Univariate Discrete Distributions. Wiley, New York, 2nd edition, 1992.","edition":"2"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2012.290"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2618243.2618254"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/2021017.2021023"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366230.1366244"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882925"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989448"},{"key":"e_1_2_1_37_1","volume-title":"Covertype, US Census","author":"Lichman M.","year":"1990","unstructured":"M. Lichman . UCI Machine Learning Repository: Higgs , Covertype, US Census ( 1990 ). archive.ics.uci.edu\/ml\/. M. Lichman. UCI Machine Learning Repository: Higgs, Covertype, US Census (1990). archive.ics.uci.edu\/ml\/."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/645575.658338"},{"key":"e_1_2_1_39_1","first-page":"1","year":"2007","unstructured":"Oracle. Data Warehousing Guide, 11g Release 1 , 2007 . Oracle. Data Warehousing Guide, 11g Release 1, 2007.","journal-title":"Oracle. Data Warehousing Guide, 11g Release"},{"key":"e_1_2_1_40_1","volume-title":"VLDB","author":"Raman V.","year":"2006","unstructured":"V. Raman and G. Swart . How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations . In VLDB , 2006 . V. Raman and G. Swart. How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. In VLDB, 2006."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536233"},{"key":"e_1_2_1_42_1","first-page":"2","author":"Saad Y.","year":"1994","unstructured":"Y. Saad . SPARSKIT: a basic tool kit for sparse matrix computations - Version 2 , 1994 . Y. Saad. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2, 1994.","journal-title":"Version"},{"key":"e_1_2_1_43_1","volume-title":"VLDB","author":"Stonebraker M.","year":"2005","unstructured":"M. Stonebraker : A Column-oriented DBMS . In VLDB , 2005 . M. Stonebraker et al. C-Store: A Column-oriented DBMS. In VLDB, 2005."},{"key":"e_1_2_1_44_1","volume-title":"SSDBM","author":"Stonebraker M.","year":"2011","unstructured":"M. Stonebraker The Architecture of SciDB . In SSDBM , 2011 . M. Stonebraker et al. The Architecture of SciDB. In SSDBM, 2011."},{"key":"e_1_2_1_45_1","unstructured":"Sysbase. IQ 15.4 System Administration Guide 2013.  Sysbase. IQ 15.4 System Administration Guide 2013."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993636.1993727"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/362084.362137"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362674"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132863.1132864"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723712"},{"key":"e_1_2_1_51_1","volume-title":"NSDI","author":"Zaharia M.","year":"2012","unstructured":"M. Zaharia Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing . In NSDI , 2012 . M. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, 2012."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593678"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2994509.2994515","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:52:15Z","timestamp":1672224735000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2994509.2994515"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,8]]},"references-count":52,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2016,8]]}},"alternative-id":["10.14778\/2994509.2994515"],"URL":"https:\/\/doi.org\/10.14778\/2994509.2994515","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2016,8]]}}}