{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T01:18:35Z","timestamp":1773278315366,"version":"3.50.1"},"reference-count":87,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,6,20]]},"abstract":"<jats:p>\n            Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result tensor. We address this shortcoming through a compiler design that generates code that uses sparse intermediate tensors (sparse workspaces) as efficient adapters between compute code that scatters and result tensors that do not support random insertion. Our compiler automatically detects sparse scattering behavior in tensor expressions and inserts necessary intermediate workspace tensors. We present an algorithm template for workspace insertion that is the backbone of our code generation algorithm. Our algorithm template is modular by design, supporting sparse workspaces that span multiple user-defined implementations. Our evaluation shows that sparse workspaces can be up to\n            <jats:inline-formula>\n              <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"inline\">\n                <mml:mn>27.12<\/mml:mn>\n                <mml:mo>\u00d7<\/mml:mo>\n              <\/mml:math>\n            <\/jats:inline-formula>\n            faster than the dense workspaces of prior work. On the other hand, dense workspaces can be up to\n            <jats:inline-formula>\n              <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"inline\">\n                <mml:mn>7.58<\/mml:mn>\n                <mml:mo>\u00d7<\/mml:mo>\n              <\/mml:math>\n            <\/jats:inline-formula>\n            faster than the sparse workspaces generated by our compiler in other situations, which motivates our compiler design that supports both. Our compiler produces sequential code that is competitive with hand-optimized linear and tensor algebra libraries on the expressions they support, but that generalizes to any other expression. Sparse workspaces are also more memory efficient than dense workspaces as they compress away zeros. This compression can asymptotically decrease memory usage, enabling tensor computations on data that would otherwise run out of memory.\n          <\/jats:p>","DOI":"10.1145\/3656426","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T16:27:20Z","timestamp":1718900840000},"page":"1213-1238","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Compilation of Modular and General Sparse Workspaces"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3866-8167","authenticated-orcid":false,"given":"Genghan","family":"Zhang","sequence":"first","affiliation":[{"name":"Stanford University, Stanford, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4195-8106","authenticated-orcid":false,"given":"Olivia","family":"Hsu","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2267-903X","authenticated-orcid":false,"given":"Fredrik","family":"Kjolstad","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523442"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1137\/13092589X"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926273"},{"key":"e_1_3_1_5_2","author":"Ansel Jason","year":"2024","unstructured":"Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, Peter Bell, David Berard, Evgeni Burovski, et al. 2024. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. (2024).","journal-title":"PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/1932681.1863581"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.76"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3591236"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1137\/110838844"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544559"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-57659-2_4"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/113379.113380"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536313"},{"key":"e_1_3_1_14_2","first-page":"578","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578\u2013594."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3276493"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385963"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630350104"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2699470"},{"key":"e_1_3_1_19_2","volume-title":"Proceedings of the GPU technology conference","author":"Demouth Julien","year":"2012","unstructured":"Julien Demouth. 2012. Sparse matrix-matrix multiplication on the GPU. In Proceedings of the GPU technology conference, Vol. 3."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2017.8"},{"key":"e_1_3_1_21_2","article-title":"Lecture notes on bucket algorithms","volume":"6","author":"Devroye Luc","year":"1986","unstructured":"Luc Devroye. 1986. Lecture notes on bucket algorithms. Progress in computer science 6 (1986).","journal-title":"Progress in computer science"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/231379.231411"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2004.1342540"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1998.681704"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1137\/0613024"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1137\/130948811"},{"key":"e_1_3_1_27_2","article-title":"Eigen","volume":"3","author":"Guennebaud Ga\u00ebl","year":"2010","unstructured":"Ga\u00ebl Guennebaud, Benoit Jacob, et al. 2010. Eigen. URL: http:\/\/eigen.tuxfamily.org 3 (2010).","journal-title":"URL: http:\/\/eigen.tuxfamily.org"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW59300.2023.00118"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/355791.355796"},{"key":"e_1_3_1_30_2","author":"Hellsten Erik","year":"2023","unstructured":"Erik Hellsten, Artur Souza, Johannes Lenfers, Rubens Lacouture, Olivia Hsu, Adel Ejjeh, Fredrik Kjolstad, Michel Steuwer, Kunle Olukotun, and Luigi Nardi. 2023. BaCO: A Fast and Portable Bayesian Compiler Optimization Framework. arXiv:2212.11142 [cs.PL]","journal-title":"BaCO: A Fast and Portable Bayesian Compiler Optimization Framework"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485505"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582051"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523446"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113355"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719918"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.2172\/7093021"},{"key":"e_1_3_1_37_2","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings","author":"Kipf Thomas N.","year":"2017","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https:\/\/openreview.net\/forum?id=SJU4ayYgl"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/AAI28928307"},{"key":"e_1_3_1_39_2","first-page":"180","author":"Kjolstad Fredrik","year":"2019","unstructured":"Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. (2019), 180\u2013192. http:\/\/dl.acm.org\/citation.cfm?id=3314872.3314894","journal-title":"Tensor Algebra Compilation with Workspaces"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.89"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.21105\/joss.01244"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0002751"},{"key":"e_1_3_1_45_2","first-page":"25","article-title":"Indexed Streams: A Formal Intermediate Representation for Fused Contraction Programs. Proc","volume":"7","author":"Kovach Scott","year":"2023","unstructured":"Scott Kovach, Praneeth Kolichala, Tiancheng Gu, and Fredrik Kjolstad. 2023. Indexed Streams: A Formal Intermediate Representation for Fused Contraction Programs. Proc. ACM Program. Lang. 7, PLDI, Article 154 (jun 2023), 25 pages.","journal-title":"ACM Program. Lang"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.12694\/scpe.v17i4.1207"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3588195.3593000"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/IA3.2016.010"},{"key":"e_1_3_1_49_2","first-page":"806","article-title":"Sparse convolutional neural networks","author":"Liu Baoyuan","year":"2015","unstructured":"Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 806\u2013814.","journal-title":"Proceedings of the IEEE conference on computer vision and pattern recognition"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1137\/1034004"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.47"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3350755.3400245"},{"key":"e_1_3_1_53_2","first-page":"87","volume-title":"International Workshop on Languages and Compilers for Parallel Computing","author":"Mutlu Erdal","year":"2020","unstructured":"Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2020. Comet: A domain-specific compilation of high-performance computational chemistry. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 87\u2013103."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3229710.3229720"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2017.19"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002940"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503221.3508431"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2545664"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374521"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-20119-1_4"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3623507.3623554"},{"key":"e_1_3_1_63_2","first-page":"213","volume-title":"Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing (LCPC \u201998)","author":"Pugh William","year":"1998","unstructured":"William Pugh and Tatiana Shpeisman. 1998. SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations. In Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing (LCPC \u201998). Springer-Verlag, Berlin, Heidelberg, 213\u2013229."},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.5555\/829576"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3428226"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3527333"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2022.07.005"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/1067649.801719"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2017.20"},{"key":"e_1_3_1_71_2","author":"Smith Shaden","year":"2017","unstructured":"Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http:\/\/frostt.io\/","journal-title":"FROSTT: The Formidable Repository of Open Sparse Tensors and Tools"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/2833179.2833183"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.27"},{"key":"e_1_3_1_74_2","volume-title":"Conference on Domain-Specific Languages (DSL 97)","author":"Stichnoth James M.","year":"1997","unstructured":"James M. Stichnoth and Thomas Gross. 1997. Code Composition as an Implementation Language for Compilers. In Conference on Domain-Specific Languages (DSL 97). USENIX Association, Santa Barbara, CA."},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2857721"},{"key":"e_1_3_1_76_2","volume-title":"Generating indexing functions of regularly sparse arrays for array compilers","author":"Thibault Scott","year":"1994","unstructured":"Scott Thibault, Lenore Mullin, and Matt Insall. 1994. Generating indexing functions of regularly sparse arrays for array compilers. Technical Report. Technical Report, University of Missouri, Rolla, 1994, TR 94\u201308X."},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/1592665.1592675"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926291"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295701"},{"key":"e_1_3_1_80_2","article-title":"Unified Convolution Framework: A compiler-based approach to support sparse convolutions","volume":"5","author":"Won Jaeyeon","year":"2023","unstructured":"Jaeyeon Won, Changwan Hong, Charith Mendis, Joel Emer, and Saman Amarasinghe. 2023. Unified Convolution Framework: A compiler-based approach to support sparse convolutions. Proceedings of Machine Learning and Systems 5 (2023).","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485513"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523437"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071080"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582047"},{"key":"e_1_3_1_85_2","author":"Zhang Genghan","year":"2024","unstructured":"Genghan Zhang, Olivia Hsu, and Fredrik Kjolstad. 2024. Compilation of Modular and General Sparse Workspaces. arXiv:2404.04541 [cs.PL]","journal-title":"Compilation of Modular and General Sparse Workspaces"},{"key":"e_1_3_1_86_2","first-page":"1","article-title":"Sgap: towards efficient sparse tensor algebra compilation for GPU","author":"Zhang Genghan","year":"2023","unstructured":"Genghan Zhang, Yuetong Zhao, Yanting Tao, Zhongming Yu, Guohao Dai, Sitao Huang, Yuan Wen, Pavlos Petoumenos, and Yu Wang. 2023. Sgap: towards efficient sparse tensor algebra compilation for GPU. CCF Transactions on High Performance Computing (2023), 1\u201318.","journal-title":"CCF Transactions on High Performance Computing"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3566054"},{"key":"e_1_3_1_88_2","first-page":"213","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Zheng Ningxin","year":"2022","unstructured":"Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, and Lidong Zhou. 2022. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 213\u2013232."}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656426","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3656426","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T20:43:18Z","timestamp":1751661798000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656426"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":87,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2024,6,20]]}},"alternative-id":["10.1145\/3656426"],"URL":"https:\/\/doi.org\/10.1145\/3656426","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,20]]},"assertion":[{"value":"2024-06-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}