{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T08:44:18Z","timestamp":1780994658730,"version":"3.54.1"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","license":[{"start":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T00:00:00Z","timestamp":1728345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,10,8]]},"abstract":"<jats:p>This paper extends prior work on sparse tensor algebra compilers to generate asymptotically efficient code for tensor expressions with affine subscript expressions. Our technique enables compiler support for a wide range of sparse computations, including sparse convolutions and pooling that are widely used in ML and graphics applications. We propose an approach that gradually rewrites compound subscript expressions to simple subscript expressions with loops that exploit the sparsity pattern of the input sparse tensors. As a result, the time complexity of the generated kernels is bounded by the number of stored elements and not by the shape of the tensors. Our approach seamlessly integrates into existing frameworks and is compatible with recent advances in compilers for sparse computations, including the flexibility to efficiently handle arbitrary combinations of different sparse tensor formats. The implementation of our algorithm is open source and upstreamed to the MLIR sparse compiler. Experimental results show that our method achieves 19.5x speedup when compared with the state-of-the-art compiler-based method at 99.9% sparsity. The generated sparse kernels start to outperform dense convolution implementations at about 80% sparsity<\/jats:p>","DOI":"10.1145\/3689721","type":"journal-article","created":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T03:23:04Z","timestamp":1728357784000},"page":"275-303","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Compiler Support for Sparse Tensor Convolutions"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6752-3274","authenticated-orcid":false,"given":"Peiming","family":"Liu","sequence":"first","affiliation":[{"name":"Google, Kirkland, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6221-1389","authenticated-orcid":false,"given":"Alexander J","family":"Root","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2263-7339","authenticated-orcid":false,"given":"Anlun","family":"Xu","sequence":"additional","affiliation":[{"name":"Google, Mountain View, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5110-5909","authenticated-orcid":false,"given":"Yinying","family":"Li","sequence":"additional","affiliation":[{"name":"Google, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2267-903X","authenticated-orcid":false,"given":"Fredrik","family":"Kjolstad","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0333-7413","authenticated-orcid":false,"given":"Aart J.C.","family":"Bik","sequence":"additional","affiliation":[{"name":"Google, Mountain View, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,10,8]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2021. https:\/\/reviews.llvm.org\/D109783."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579990.3580020"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544559"},{"key":"e_1_3_1_5_2","volume-title":"Compiler Support for Sparse Matrix Computations","author":"Bik Aart J.C","year":"1996","unstructured":"Aart J.C. Bik. 1996. Compiler Support for Sparse Matrix Computations. Ph. D. Dissertation. Department of Computer Science, Leiden University. ISBN 90-9009442-3."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/290200.287636"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1995.1141"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.485501"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-021-05783-5"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3276493"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5222004"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_1_13_2","volume-title":"Advances in Neural Information Processing Systems","author":"Denker John","year":"1988","unstructured":"John Denker, W. Gardner, Hans Graf, Donnie Henderson, R. Howard, W. Hubbard, L. D. Jackel, Henry Baird, and Isabelle Guyon. 1988. Neural Network Recognizer for Hand-Written Zip Code Digits. In Advances in Neural Information Processing Systems, D. Touretzky(Ed.), Vol. 1. Morgan-Kaufmann. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/1988\/file\/a97da629b098b75c294dffdc3e463904-Paper.pdf"},{"key":"e_1_3_1_14_2","article-title":"Improving Locality in Sparse and Dense Matrix Multiplications","author":"Dezfuli Mohammad Mahdi Salehi","year":"2024","unstructured":"Mohammad Mahdi Salehi Dezfuli and Kazem Cheshmi. 2024. Improving Locality in Sparse and Dense Matrix Multiplications. arXiv preprint arXiv:2407.00243 (2024).","journal-title":"arXiv preprint arXiv:2407.00243"},{"key":"e_1_3_1_15_2","volume-title":"Direct Methods for Sparse Matrices","author":"Duff Iain S","year":"1990","unstructured":"Iain S. Duff, A.M. Erisman, and J.K. Reid. 1990. Direct Methods for Sparse Matrices. Oxford Science Publications, Oxford."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185540"},{"key":"e_1_3_1_17_2","article-title":"Fast Sparse ConvNets","author":"Elsen Erich","year":"2019","unstructured":"Erich Elsen, Marat Dukhan, Trevor Gale, and Karen Simonyan. 2019. Fast Sparse ConvNets. CoRR abs\/1911.09723 (2019). arXiv:1911.09723 http:\/\/arxiv.org\/abs\/1911.09723","journal-title":"CoRR"},{"key":"e_1_3_1_18_2","unstructured":"William Fedus Jeff Dean and Barret Zoph. 2022. A review of sparse expert models in deep learning. arXiv preprint arXiv:2209.01667 (2022)."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088197"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSSC.1969.300225"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00344251"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/578296"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414655"},{"key":"e_1_3_1_24_2","article-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding","author":"Han Song","year":"2015","unstructured":"Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).","journal-title":"arXiv preprint arXiv:1510.00149"},{"key":"e_1_3_1_25_2","first-page":"770","article-title":"Deep residual learning for image recognition","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770\u2013778.","journal-title":"Proceedings of the IEEE conference on computer vision and pattern recognition"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485505"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3559009.3569668"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1088\/2634-4386\/ac7c8a"},{"key":"e_1_3_1_29_2","unstructured":"Fredrik Kjolstad. 2020. Sparse Tensor Algebra Compilation. Ph. D. Dissertation. Massachusetts Institute of Technology Cambridge MA."},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Fredrik Kjolstad et al. 2017. TACO: The Tensor Algebra Compiler. Open-source project available at http:\/\/tensor-compiler.org\/.","DOI":"10.1145\/3133901"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661185"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_3_1_33_2","unstructured":"Fredrik Berg Kjolstad. 2020. Sparse tensor algebra compilation. Ph.D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0002751"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/1273442.1250761"},{"key":"e_1_3_1_36_2","unstructured":"Mark Kurtz Justin Kopinsky Rati Gelashvili Alexander Matveev John Carr Michael Goin William Leiserson Sage Moore Bill Nell Nir Shavit and Dan Alistarh. 2020. Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research Vol. 119) Hal Daume III and Aarti Singh(Eds.). PMLR Virtual 5533\u20135543. http:\/\/proceedings.mlr.press\/v119\/kurtz20a.html"},{"key":"e_1_3_1_37_2","article-title":"MLIR: A Compiler Infrastructure for the End of Moore's Law","author":"Lattner Chris","year":"2020","unstructured":"Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore's Law. CoRR abs\/2002.11054 (2020). arXiv:2002.11054 https:\/\/arxiv.org\/abs\/2002.11054","journal-title":"CoRR"},{"key":"e_1_3_1_38_2","first-page":"515","volume-title":"Conference on Parsimony and Learning","author":"Lee Joo Hyung","year":"2024","unstructured":"Joo Hyung Lee, Wonpyo Park, Nicole Elyse Mitchell, Jonathan Pilault, Johan Samir Obando Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, et al. 2024. JaxPruner: A concise library for sparsity research. In Conference on Parsimony and Learning. PMLR, 515\u2013528."},{"key":"e_1_3_1_39_2","article-title":"Enabling Sparse Winograd Convolution by Native Pruning","author":"Li Sheng R","year":"2017","unstructured":"Sheng R. Li, Jongsoo Park, and Ping Tak Peter Tang. 2017. Enabling Sparse Winograd Convolution by Native Pruning. CoRR abs\/1702.08597 (2017). arXiv:1702.08597 http:\/\/arxiv.org\/abs\/1702.08597","journal-title":"CoRR"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/335231.335240"},{"key":"e_1_3_1_41_2","unstructured":"Iman Mirzadeh Keivan Alizadeh Sachin Mehta Carlo C Del Mundo Oncel Tuzel Golnoosh Samei Mohammad Rastegari and Mehrdad Farajtabar. 2023. ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models. arXiv:2310.04564 [cs.LG]"},{"key":"e_1_3_1_42_2","first-page":"807","article-title":"Rectified linear units improve restricted boltzmann machines","author":"Nair Vinod","year":"2010","unstructured":"Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807\u2013814.","journal-title":"Proceedings of the 27th international conference on machine learning (ICML-10)"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378534"},{"key":"e_1_3_1_44_2","article-title":"Faster CNNs with Direct Sparse Convolutions and Guided Pruning","author":"Park Jongsoo","year":"2017","unstructured":"Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2017. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=rJPcZ3txx","journal-title":"International Conference on Learning Representations"},{"key":"e_1_3_1_45_2","volume-title":"Sparse Matrix Technology","author":"Pissanetsky Sergio","year":"1984","unstructured":"Sergio Pissanetsky. 1984. Sparse Matrix Technology. Academic Press, London."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185528"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/2833179.2833183"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.5201\/ipol.2015.35"},{"key":"e_1_3_1_49_2","article-title":"Compilers for Regular and Irregular Stencils: Some Shared Problems and Solutions","author":"Strout Michelle Mills","year":"2013","unstructured":"Michelle Mills Strout. 2013. Compilers for Regular and Irregular Stencils: Some Shared Problems and Solutions. In Proceedings of Workshop on Optimizing Stencil Computations (WOSC).","journal-title":"Proceedings of Workshop on Optimizing Stencil Computations (WOSC)"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2857721"},{"key":"e_1_3_1_51_2","first-page":"117","article-title":"The pochoir stencil compiler","author":"Tang Yuan","year":"2011","unstructured":"Yuan Tang, Rezaul Alam Chowdhury, Bradley C Kuszmaul, Chi-Keung Luk, and Charles E Leiserson. 2011. The pochoir stencil compiler. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures. 117\u2013128.","journal-title":"Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures"},{"key":"e_1_3_1_52_2","volume-title":"Sparse Matrices","author":"Tewarson Reginal P","year":"1973","unstructured":"Reginal P. Tewarson. 1973. Sparse Matrices. Academic Press, New York, NY."},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.629489"},{"key":"e_1_3_1_54_2","article-title":"Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions","author":"Vasilache Nicolas","year":"2018","unstructured":"Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR abs\/1802.04730 (2018). arXiv:1802.04730 http:\/\/arxiv.org\/abs\/1802.04730","journal-title":"CoRR"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/29.21701"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3591302"},{"key":"e_1_3_1_57_2","article-title":"Unified Convolution Framework: A compiler-based approach to support sparse convolutions","volume":"5","author":"Won Jaeyeon","year":"2023","unstructured":"Jaeyeon Won, Changwan Hong, Charith Mendis, Joel Emer, and Saman Amarasinghe. 2023. Unified Convolution Framework: A compiler-based approach to support sparse convolutions. Proceedings of Machine Learning and Systems 5 (2023).","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_1_58_2","first-page":"660","article-title":"SparseTIR: Composable abstractions for sparse compilation in deep learning","author":"Ye Zihao","year":"2023","unstructured":"Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable abstractions for sparse compilation in deep learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 660\u2013678.","journal-title":"Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3."},{"key":"e_1_3_1_59_2","first-page":"18309","article-title":"Advancing model pruning via bi-level optimization","volume":"35","author":"Zhang Yihua","year":"2022","unstructured":"Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu. 2022. Advancing model pruning via bi-level optimization. Advances in Neural Information Processing Systems 35 (2022), 18309\u201318326.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-017-1116-6"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689721","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689721","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T09:03:54Z","timestamp":1770195834000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689721"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,8]]},"references-count":59,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2024,10,8]]}},"alternative-id":["10.1145\/3689721"],"URL":"https:\/\/doi.org\/10.1145\/3689721","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,8]]},"assertion":[{"value":"2024-04-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}