{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,20]],"date-time":"2026-06-20T16:22:39Z","timestamp":1781972559964,"version":"3.54.5"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA","license":[{"start":{"date-parts":[[2020,11,13]],"date-time":"2020-11-13T00:00:00Z","timestamp":1605225600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004750","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1533753"],"award-info":[{"award-number":["CCF-1533753"]}],"id":[{"id":"10.13039\/501100004750","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010690","name":"Toyota Research Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100010690","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Applications Driving Architectures (ADA) Research Center"},{"DOI":"10.13039\/100011735","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DE-SC0018121"],"award-info":[{"award-number":["DE-SC0018121"]}],"id":[{"id":"10.13039\/100011735","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["HR0011-18-3-0007"],"award-info":[{"award-number":["HR0011-18-3-0007"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2020,11,13]]},"abstract":"<jats:p>We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the original iteration space to derived iteration spaces. These functions are needed by the code generator to emit code that maps coordinates between iteration spaces at runtime, since the coordinates in the sparse data structures remain in the original iteration space. We further demonstrate that derived iteration spaces can tile both the universe of coordinates and the subset of nonzero coordinates: the former is analogous to tiling dense iteration spaces, while the latter tiles sparse iteration spaces into statically load-balanced blocks of nonzeros. Tiling the space of nonzeros lets the generated code efficiently exploit heterogeneous compute resources such as threads, vector units, and GPUs.<\/jats:p>\n          <jats:p>We implement these concepts by extending the sparse iteration theory implementation in the TACO system. The associated scheduling API can be used by performance engineers or it can be the target of an automatic scheduling system. We outline one heuristic autoscheduling system, but other systems are possible. Using the scheduling API, we show how to optimize mixed sparse-dense tensor algebra expressions on CPUs and GPUs. Our results show that the sparse transformations are sufficient to generate code with competitive performance to hand-optimized implementations from the literature, while generalizing to all of the tensor algebra.<\/jats:p>","DOI":"10.1145\/3428226","type":"journal-article","created":{"date-parts":[[2020,11,24]],"date-time":"2020-11-24T23:40:14Z","timestamp":1606261214000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":49,"title":["A sparse iteration space transformation framework for sparse tensor algebra"],"prefix":"10.1145","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0558-2164","authenticated-orcid":false,"given":"Ryan","family":"Senanayake","sequence":"first","affiliation":[{"name":"Reservoir Labs, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Changwan","family":"Hong","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ziheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Amalee","family":"Wilson","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stephen","family":"Chou","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shoaib","family":"Kamil","sequence":"additional","affiliation":[{"name":"Adobe Research, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Saman","family":"Amarasinghe","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fredrik","family":"Kjolstad","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,11,13]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/3026877.3026899"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322967"},{"key":"e_1_2_2_3_1","volume-title":"Allen and John Cocke","author":"Frances","year":"1972"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/109626.109631"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1080\/00268970500275780"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661197"},{"key":"e_1_2_2_7_1","first-page":"192","volume-title":"Gelernter D., Gross T., Padua D. (eds) Advances in languages and compilers for parallel computing ( 1991 )","author":"Banerjee Utpal"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2017.8091026"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2014.7041006"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/165939.166023"},{"key":"e_1_2_2_13_1","first-page":"578","volume-title":"Symposium on Operating Systems Design and Implementation. USENIX Association","author":"Chen Tianqi","year":"2018"},{"key":"e_1_2_2_14_1","volume-title":"Advances in Neural Information Processing Systems 31","author":"Chen Tianqi"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276493"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385963"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1002\/jcc.23377"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1051\/ro\/1988220302431"},{"key":"e_1_2_2_20_1","unstructured":"Ga\u00ebl Guennebaud Beno\u00eet Jacob etal 2010. Eigen v3. http:\/\/eigen.tuxfamily.org  Ga\u00ebl Guennebaud Beno\u00eet Jacob et al. 2010. Eigen v3. http:\/\/eigen.tuxfamily.org"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295712"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113355"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661185"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115709"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0002751"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/360827.360844"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504194"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.57"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925952"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00023"},{"key":"e_1_2_2_35_1","unstructured":"NVIDIA V10.1.243. 2019. cuSPARSE Software Library. https:\/\/docs.nvidia.com\/cuda\/archive\/10.1\/cusparse\/index.html  NVIDIA V10.1.243. 2019. cuSPARSE Software Library. https:\/\/docs.nvidia.com\/cuda\/archive\/10.1\/cusparse\/index.html"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983990.2984015"},{"key":"e_1_2_2_37_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic diferentiation in PyTorch. ( 2017 ). https:\/\/openreview.net\/pdf?id= BJJsrmfCZ  Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic diferentiation in PyTorch. ( 2017 ). https:\/\/openreview.net\/pdf?id= BJJsrmfCZ"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48319-5_14"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2335383"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_2_41_1","volume-title":"A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra. M. Eng. Thesis","author":"Senanayake Ryan","year":"2020"},{"key":"e_1_2_2_42_1","volume-title":"FROSTT: The Formidable Repository of Open Sparse Tensors and Tools","author":"Smith Shaden","year":"2017"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2833179.2833183"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.27"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.06.002"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2857721"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2017.7863747"},{"key":"e_1_2_2_48_1","volume-title":"Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. Technical Report. 12 pages. arXiv","author":"Vasilache Nicolas","year":"2018"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2738003"},{"key":"e_1_2_2_50_1","volume-title":"Automatic Optimization of Sparse Tensor Algebra Programs. M.Eng. Thesis","author":"Wang Ziheng"},{"key":"e_1_2_2_52_1","volume-title":"Proc. Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers.","author":"Wonnacott David","year":"1995"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-96983-1_48"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276491"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3428226","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3428226","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3428226","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:57Z","timestamp":1750197777000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3428226"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,13]]},"references-count":50,"journal-issue":{"issue":"OOPSLA","published-print":{"date-parts":[[2020,11,13]]}},"alternative-id":["10.1145\/3428226"],"URL":"https:\/\/doi.org\/10.1145\/3428226","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,13]]},"assertion":[{"value":"2020-11-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}