{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:08:44Z","timestamp":1760058524613,"version":"build-2065373602"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA1","license":[{"start":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T00:00:00Z","timestamp":1744156800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"crossref","award":["EP\/W007940\/1"],"award-info":[{"award-number":["EP\/W007940\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2025,4,9]]},"abstract":"<jats:p>Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra and efficient structure-aware algorithms provided by sparse tensor algebra. In this paper, we introduce DASTAC, a framework to propagate the tensors's captured high-level structure down to low-level code generation by incorporating techniques such as automatic data layout compression, polyhedral analysis, and affine code generation. Our methodology reduces memory footprint by automatically detecting the best data layout, heavily benefits from polyhedral optimizations, leverages further optimizations, and enables parallelization through MLIR. Through extensive experimentation, we show that DASTAC can compete if not significantly outperform specialized hand-tuned implementation by experts. We observe 0.16x - 44.83x and 1.37x - 243.78x speed-up for single- and multi-threaded cases, respectively.<\/jats:p>","DOI":"10.1145\/3720506","type":"journal-article","created":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T13:48:26Z","timestamp":1744206506000},"page":"1717-1745","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Compressed and Parallelized Structured Tensor Algebra"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3686-2158","authenticated-orcid":false,"given":"Mahdi","family":"Ghorbani","sequence":"first","affiliation":[{"name":"University of Edinburgh, Edinburgh, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8028-3064","authenticated-orcid":false,"given":"Emilien","family":"Bauer","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Edinburgh, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3874-6003","authenticated-orcid":false,"given":"Tobias","family":"Grosser","sequence":"additional","affiliation":[{"name":"University of Cambridge, Cambridge, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9062-759X","authenticated-orcid":false,"given":"Amir","family":"Shaikhha","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Edinburgh, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,4,9]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201916)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201916). USENIX Association, USA. 265\u2013283. isbn:9781931971331"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2404.16730"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3579990.3580020"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.04900"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314615"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661197"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3591236"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2004.10018"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_2_1_10_1","volume-title":"Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang.","author":"Bradbury James","year":"2018","unstructured":"James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http:\/\/github.com\/google\/jax"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: an automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918). USENIX Association, USA. 579\u2013594. isbn:9781931971478"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126936"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3563338"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276493"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/346023.346031"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(00)00087-9"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2112.00029"},{"volume-title":"From matrix to tensor: Multilinear algebra and signal processing","author":"Lathauwer Lieven De","key":"e_1_2_1_18_1","unstructured":"Lieven De Lathauwer and Bart De Moor. 1998. From matrix to tensor: Multilinear algebra and signal processing. In Institute of mathematics and its applications conference series. 67, 1\u201316."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.839989"},{"key":"e_1_2_1_21_1","volume-title":"Workshop on Open-Source EDA Technology (WOSET).","author":"Eldridge Schuyler","year":"2021","unstructured":"Schuyler Eldridge, Prithayan Barua, Aliaksei Chapyzhenka, Adam Izraelevitz, Jack Koenig, Chris Lattner, Andrew Lenharth, George Leontiev, Fabian Schuiki, Ram Sunder, et al. 2021. MLIR as hardware compiler infrastructure. In Workshop on Open-Source EDA Technology (WOSET)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3291041"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3235029"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3622804"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358291"},{"key":"e_1_2_1_26_1","volume-title":"Moses","author":"Govindarajan Sanath","year":"2020","unstructured":"Sanath Govindarajan and William S. Moses. 2020. SyFER-MLIR: Integrating Fully Homomorphic Encryption Into the MLIR Compiler Framework."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626412500107"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1080\/08898488909525291"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-020-2649-2"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358275"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2467268"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586329"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589350"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-023-00817-w"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3375661"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3375661"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1515\/9783110365832"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00354-023-00226-1"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_2_1_40_1","unstructured":"Innovative Computing Laboratory. 2023. The Performance Application Programming Interface (PAPI). https:\/\/icl.utk.edu\/papi\/"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370308"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013535431127"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3623278.3624767"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1511.05946"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT52795.2021.00011"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786763.2694364"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2005.12.016"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3315454.3329958"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2010.2058802"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2212.05159"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2021.3074329"},{"volume-title":"PyTorch: an imperative style, high-performance deep learning library","author":"Paszke Adam","key":"e_1_2_1_52_1","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: an imperative style, high-performance deep learning library. Curran Associates Inc., Red Hook, NY, USA."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588717"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO57630.2024.10444787"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3527333"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2017.2690524"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854060"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2857721"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14080"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-019-47858-2"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1802.04730"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.5555\/1888390.1888455"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","unstructured":"Sven Verdoolaege Rachid Seghir Kristof Beyls Vincent Loechner and Maurice Bruynooghe. 2007-05-01. Counting integer points in parametric polytopes using Barvinok\u2019s rational functions. 37 - 66 pages. issn:1432-0541 https:\/\/doi.org\/10.1007\/s00453-006-1231-0 10.1007\/s00453-006-1231-0","DOI":"10.1007\/s00453-006-1231-0"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0686-2"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","unstructured":"Endong Wang Qing Zhang Bo Shen Guangyong Zhang Xiaowei Lu Qing Wu and Yajuan Wang. 2014. Intel Math Kernel Library. 167\u2013188. isbn:978-3-319-06486-4 https:\/\/doi.org\/10.1007\/978-3-319-06486-4_7 10.1007\/978-3-319-06486-4_7","DOI":"10.1007\/978-3-319-06486-4_7"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-MAIN.496"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3591302"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582047"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3720506","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3720506","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:14:48Z","timestamp":1760030088000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3720506"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,9]]},"references-count":68,"journal-issue":{"issue":"OOPSLA1","published-print":{"date-parts":[[2025,4,9]]}},"alternative-id":["10.1145\/3720506"],"URL":"https:\/\/doi.org\/10.1145\/3720506","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2025,4,9]]},"assertion":[{"value":"2024-10-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}