{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:04:35Z","timestamp":1750309475636,"version":"3.41.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,11,9]],"date-time":"2024-11-09T00:00:00Z","timestamp":1731110400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62402311"],"award-info":[{"award-number":["62402311"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2025,1,31]]},"abstract":"<jats:p>Network sparsification serves as an effective technique to accelerate Deep Neural Network (DNN) inference. However, existing sparsification techniques often rely on structured sparsity, which yields limited benefits. This is primarily due to the significant memory and computational overhead introduced by numerous sparse storage formats during address generation and gradient updates. Additionally, many of these solutions are tailored solely for the inference phase, neglecting the crucial training phase.<\/jats:p>\n          <jats:p\/>\n          <jats:p>In this article, we introduce STCO, a novel Sparse Tensor Compilation Optimization technique that significantly enhances training efficiency through structured sparse tensor compilation. Central to STCO is the Tensorization-aware Index Entity (TIE) format, which effectively represents structured sparse tensors by eliminating redundant indices and minimizing storage overhead. The TIE format plays a pivotal role in the Address-carry flow (AC flow) pass, which optimizes the data layout at the computational graph level. This pass leverages the TIE format to enhance the efficiency of tensor representations, enabling more compact and efficient sparse tensor storage. Meanwhile, a shape inference pass utilizes the AC flow to derive optimized tensor shapes, further refining the performance of sparse tensor operations. Moreover, the Address-Carry TIE Flow dynamically tracks nonzero addresses, extending the benefits of sparse optimization to both forward and backward propagation. This seamless integration into the training pipeline enables a smooth transition to sparse tensor compilation without significant modifications to existing codebases. To further boost training performance, we implement an operator-level AC flow optimization pass tailored for structured sparse tensors. This pass generates efficient addresses, ensuring minimal computational overhead during sparse tensor operations. The flexibility of STCO allows it to be efficiently integrated into various frameworks or compilers, providing a robust solution for enhancing training efficiency with structured sparse tensors. Experiments demonstrated that STCO achieved impressive speedups of 3.64\u00d7, 5.43\u00d7, 4.89\u00d7, and 3.91\u00d7 when compared to state-of-the-art sparse formats on VGG16, ResNet-18, MobileNetV1, and MobileNetV2, respectively. These findings underscore the efficiency and superiority of our proposed approach in leveraging unstructured sparsity for DNN inference acceleration.<\/jats:p>","DOI":"10.1145\/3701033","type":"journal-article","created":{"date-parts":[[2024,10,21]],"date-time":"2024-10-21T10:57:53Z","timestamp":1729508273000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["STCO: Enhancing Training Efficiency via Structured Sparse Tensor Compilation Optimization"],"prefix":"10.1145","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8086-6802","authenticated-orcid":false,"given":"Shiyuan","family":"Huang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China and Shanghai Qi Zhi Institute, Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8769-293X","authenticated-orcid":false,"given":"fangxin","family":"liu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai China and Shanghai Qi Zhi Institute, Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2495-1622","authenticated-orcid":false,"given":"Tian","family":"Li","sequence":"additional","affiliation":[{"name":"Huawei Technologies Co Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2157-4927","authenticated-orcid":false,"given":"Zongwu","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai China and Shanghai Qi Zhi Institute, Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6964-8910","authenticated-orcid":false,"given":"Ning","family":"Yang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2939-6534","authenticated-orcid":false,"given":"Haomin","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7353-8798","authenticated-orcid":false,"given":"Li","family":"Jiang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai China and Shanghai Qi Zhi Institute, Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,9]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2021. cuSPARSELt: A high-performance CUDA library for sparse matrix-matrix multiplication. Retrieved 10 April 2022 fromhttps:\/\/docs.nvidia.com\/cuda\/cusparselt\/index.html"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/1379022.1375595"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","unstructured":"Aydin Bulu\u00e7 Jeremy T. Fineman Matteo Frigo John R. Gilbert and Charles E. Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures. ACM New York NY USA. DOI:10.1145\/1583991.1584053","DOI":"10.1145\/1583991.1584053"},{"key":"e_1_3_2_5_2","first-page":"578","volume-title":"Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). USENIX Association, Carlsbad, CA, 578\u2013594. Retrieved fromhttps:\/\/www.usenix.org\/conference\/osdi18\/presentation\/chen"},{"key":"e_1_3_2_6_2","unstructured":"Scott Cyphers Arjun K. Bansal Anahita Bhiwandiwalla Jayaram Bobba Matthew Brookhart Avijit Chakraborty Will Constable Christian Convey Leona Cook Omar Kanawi Robert Kimball Jason Knight Nikolay Korovaiko Varun Kumar Yixing Lao Christopher R. Lishka Jaikrishnan Menon Jennifer Myers Sandeep Aswath Narayana Adam Procter and Tristan J. Webb. 2018. Intel nGraph: An intermediate representation compiler and executor for deep learning. Retrieved from https:\/\/arxiv.org\/abs\/1801.08058"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","unstructured":"Iain S. Duff Michael A. Heroux and Roldan Pozo. 2002. An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum. ACM Transactions on Mathematical Software 28 2 (2002) 239\u2013267. DOI:10.1145\/567806.567810","DOI":"10.1145\/567806.567810"},{"key":"e_1_3_2_8_2","series-title":"SC\u201920","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Gale Trevor","year":"2020","unstructured":"Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU kernels for deep learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC\u201920). IEEE Press, Article 17, 14 pages."},{"key":"e_1_3_2_9_2","unstructured":"Aidan N. Gomez Ivan Zhang Siddhartha Rao Kamalakara Divyam Madaan Kevin Swersky Yarin Gal and Geoffrey E. Hinton. 2019. Learning sparse networks using targeted dropout. Retrieved from https:\/\/arxiv.org\/abs\/1905.13678"},{"key":"e_1_3_2_10_2","unstructured":"Aidan N. Gomez Ivan Zhang Kevin Swersky Yarin Gal and Geoffrey E. Hinton. 2011. Targeted dropout. Retrieved 6 May 2022 from https:\/\/openreview.net\/pdf?id=HkghWScuoQ"},{"volume-title":"Tensorflow-lite.","year":"2019","key":"e_1_3_2_11_2","unstructured":"Google.2019. In Tensorflow-lite. Retrieved 6 May 2022 from https:\/\/www.tensorflow.org\/mobile\/tflite"},{"key":"e_1_3_2_12_2","volume-title":"Advances in Neural Information Processing Systems","author":"Guo Yiwen","year":"2016","unstructured":"Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient DNNs. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.). Vol. 29. Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2016\/file\/2823f4797102ce1a1aec05359cc16dd9-Paper.pdf"},{"key":"e_1_3_2_13_2","volume-title":"Advances in Neural Information Processing Systems","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.). Vol. 28. Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2015\/file\/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf"},{"key":"e_1_3_2_14_2","unstructured":"Andrew Hard Kanishka Rao Rajiv Mathews Swaroop Ramaswamy Fran\u00e7oise Beaufays Sean Augenstein Hubert Eichner Chlo\u00e9 Kiddon and Daniel Ramage. 2019. Federated learning for mobile keyboard prediction. Retrieved from https:\/\/arxiv.org\/abs\/1811.03604"},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"He Yihui","year":"2017","unstructured":"Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","unstructured":"Changwan Hong Aravind Sukumaran-Rajam Israt Nisa Kunal Singh and P. Sadayappan. 2019. Adaptive sparse tiling for sparse matrix multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. ACM New York NY USA. DOI:10.1145\/3293883.3295712","DOI":"10.1145\/3293883.3295712"},{"key":"e_1_3_2_17_2","article-title":"SpMMPlu-Pro: An enhanced compiler plug-in for efficient SpMM and sparsity propagation algorithm","author":"Huang Shiyuan","year":"2024","unstructured":"Shiyuan Huang, Fangxin Liu, Tao Yang, Zongwu Wang, Ning Yang, and Li Jiang. 2024. SpMMPlu-Pro: An enhanced compiler plug-in for efficient SpMM and sparsity propagation algorithm. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024).","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"e_1_3_2_18_2","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Huang Zehao","year":"2018","unstructured":"Zehao Huang and Naiyan Wang. 2018. Data-driven sparse structure selection for deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV\u201918)."},{"key":"e_1_3_2_19_2","first-page":"180","volume-title":"Proceedings of the 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919)","author":"Kjolstad Fredrik","year":"2019","unstructured":"Fredrik Kjolstad, Willow Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor algebra compilation with workspaces. In Proceedings of the 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919). 180\u2013192. DOI:10.1109\/CGO.2019.8661185"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_3_2_21_2","first-page":"1029","volume-title":"Proceedings of the 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201924)","author":"Liu Fangxin","year":"2024","unstructured":"Fangxin Liu, Ning Yang, Haomin Li, Zongwu Wang, Zhuoran Song, Songwen Pei, and Li Jiang. 2024. SPARK: Scalable and precision-aware acceleration of neural networks via efficient encoding. In Proceedings of the 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201924). IEEE, 1029\u20131042."},{"key":"e_1_3_2_22_2","first-page":"1692","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"36","author":"Liu Fangxin","year":"2022","unstructured":"Fangxin Liu, Wenbo Zhao, Yongbiao Chen, Zongwu Wang, and Li Jiang. 2022. Spikeconverter: An efficient conversion framework zipping the gap between artificial neural networks and spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36, 1692\u20131701."},{"key":"e_1_3_2_23_2","first-page":"5281","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Liu Fangxin","year":"2021","unstructured":"Fangxin Liu, Wenbo Zhao, Zhezhi He, Yanzhi Wang, Zongwu Wang, Changzhi Dai, Xiaoyao Liang, and Li Jiang. 2021. Improving neural network efficiency via post-training quantization with adaptive floating-point. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). 5281\u20135290."},{"key":"e_1_3_2_24_2","first-page":"417","volume-title":"Proceedings of the 2021 IEEE 39th International Conference on Computer Design (ICCD\u201921)","author":"Liu Fangxin","year":"2021","unstructured":"Fangxin Liu, Wenbo Zhao, Zhezhi He, Zongwu Wang, Yilong Zhao, Tao Yang, Jingnai Feng, Xiaoyao Liang, and Li Jiang. 2021. SME: ReRAM-based sparse-multiplication-engine to squeeze-out bit sparsity of neural network. In Proceedings of the 2021 IEEE 39th International Conference on Computer Design (ICCD\u201921). IEEE, 417\u2013424."},{"key":"e_1_3_2_25_2","first-page":"259","volume-title":"Proceedings of the 59th ACM\/IEEE Design Automation Conference","author":"Liu Fangxin","year":"2022","unstructured":"Fangxin Liu, Wenbo Zhao, Zongwu Wang, Yongbiao Chen, Zhezhi He, Naifeng Jing, Xiaoyao Liang, and Li Jiang. 2022. EBSP: Evolving bit sparsity patterns for hardware-friendly inference of quantized deep neural networks. In Proceedings of the 59th ACM\/IEEE Design Automation Conference. 259\u2013264."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","unstructured":"Fangxin Liu Wenbo Zhao Zongwu Wang Yongbiao Chen Xiaoyao Liang and Li Jiang. 2024. ERA-BS: Boosting the efficiency of ReRAM-based PIM accelerator with fine-grained bit-level sparsity. IEEE Transactions on Computers 73 9 (2024) 2320\u20132334. DOI:10.1109\/TC.2023.3290869","DOI":"10.1109\/TC.2023.3290869"},{"key":"e_1_3_2_27_2","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Liu Zhuang","year":"2017","unstructured":"Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3063265"},{"key":"e_1_3_2_29_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Mallya Arun","year":"2018","unstructured":"Arun Mallya and Svetlana Lazebnik. 2018. PackNet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","unstructured":"Matiur Rahman Minar and Jibon Naher. 2018. Recent Advances in Deep Learning: An Overview. Unpublished. DOI:10.13140\/RG.2.2.24831.10403","DOI":"10.13140\/RG.2.2.24831.10403"},{"key":"e_1_3_2_31_2","unstructured":"Asit Mishra Jorge Albericio Latorre Jeff Pool Darko Stosic Dusan Stosic Ganesh Venkatesh Chong Yu and Paulius Micikevicius. 2021. Accelerating sparse deep neural networks. Retrieved from https:\/\/arxiv.org\/abs\/2104.08378"},{"key":"e_1_3_2_32_2","first-page":"377","volume-title":"Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP\u201924)","author":"Pang Meng","year":"2024","unstructured":"Meng Pang, Xiang Fei, Peng Qu, Youhui Zhang, and Zhaolin Li. 2024. A row decomposition-based approach for sparse matrix multiplication on GPUs. In Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP\u201924). ACM, New York, NY, USA, 377\u2013389. DOI:10.1145\/3627535.3638470"},{"key":"e_1_3_2_33_2","unstructured":"Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga Alban Desmaison Andreas Kopf Edward Yang Zachary DeVito Martin Raison Alykhan Tejani Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai and Soumith Chintala. 2019. PyTorch: An imperative style high-performance deep learning library. In Advances in Neural Information Processing Systems Curran Associates Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","unstructured":"Ari Rasch. 2024. (De\/Re)-composition of data-parallel computations via multi-dimensional homomorphisms. ACM Trans. Program. Lang. Syst. 46 3 (October 2024). DOI:10.1145\/3665643","DOI":"10.1145\/3665643"},{"key":"e_1_3_2_35_2","unstructured":"Nadav Rotem Jordan Fix Saleem Abdulrasool Garret Catron Summer Deng Roman Dzhabarov Nick Gibson James Hegeman Meghan Lele Roman Levenstein Jack Montgomery Bert Maher Satish Nadathur Jakob Olesen Jongsoo Park Artem Rakhov Misha Smelyanskiy and Man Wang. 2019. Glow: Graph lowering compiler techniques for neural networks. Retrieved from https:\/\/arxiv.org\/abs\/1805.00907"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2017.9"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","unstructured":"Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM New York NY USA. DOI:10.1145\/2833179.2833183","DOI":"10.1145\/2833179.2833183"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"108","DOI":"10.23919\/DATE.2019.8715135","volume-title":"Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE\u201919)","author":"Song Zhuoran","year":"2019","unstructured":"Zhuoran Song, Ru Wang, Dongyu Ru, Zhenghao Peng, Hongru Huang, Hai Zhao, Xiaoyao Liang, and Li Jiang. 2019. Approximate random dropout for DNN training acceleration in GPGPU. In Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE\u201919). 108\u2013113. DOI:10.23919\/DATE.2019.8715135"},{"key":"e_1_3_2_39_2","volume-title":"SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations-Version 2","author":"Saad Youcef","year":"1994","unstructured":"Youcef Saad. 1994. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations-Version 2. Technical Report. Tech. Rep. Computer Science Department, Univ. of Minnesota, Minneapolis, MN."},{"issue":"1","key":"e_1_3_2_40_2","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava Nitish","year":"2014","unstructured":"Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929\u20131958.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_2_41_2","unstructured":"Xu Sun Xuancheng Ren Shuming Ma and Houfeng Wang. 2017. meProp: Sparsified back propagation for accelerated deep learning with reduced overfitting. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research). PMLR 3299\u20133308. Retrieved from https:\/\/proceedings.mlr.press\/v70\/sun17c.html"},{"key":"e_1_3_2_42_2","unstructured":"Nicolas Vasilache Oleksandr Zinenko Theodoros Theodoridis Priya Goyal Zachary DeVito William S. Moses Sven Verdoolaege Andrew Adams and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. Retrieved from https:\/\/arxiv.org\/abs\/1802.04730"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_3_2_44_2","unstructured":"Wei Wen Chunpeng Wu Yandan Wang Yiran Chen and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. Curran Associates Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2016\/file\/41bfd20a38bb1b0bec75acf0845530a7-Paper.pdf"},{"key":"e_1_3_2_45_2","volume-title":"Advances in Neural Information Processing Systems","author":"Wen Wei","year":"2016","unstructured":"Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.). Vol. 29. Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2016\/file\/41bfd20a38bb1b0bec75acf0845530a7-Paper.pdf"},{"key":"e_1_3_2_46_2","first-page":"660","volume-title":"Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS\u201923)","author":"Ye Zihao","year":"2023","unstructured":"Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable abstractions for sparse compilation in deep learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS\u201923). ACM, New York, NY, USA, 660\u2013678. DOI:10.1145\/3582016.3582047"},{"key":"e_1_3_2_47_2","unstructured":"Haoran You Chaojian Li Pengfei Xu Yonggan Fu Yue Wang Xiaohan Chen Richard G. Baraniuk Zhangyang Wang and Yingyan Lin. 2022. Drawing early-bird tickets: Towards more efficient training of deep networks. Retrieved from https:\/\/arxiv.org\/abs\/1909.11957"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","unstructured":"Jie Zhao Bojie Li Wang Nie Zhen Geng Renwei Zhang Xiong Gao Bin Cheng Chen Wu Yun Cheng Zheng Li Peng Di Kun Zhang and Xuefeng Jin. 2021. AKG: Automatic kernel generation for neural processing units using polyhedral transformations. In (PLDI 2021) Association for Computing Machinery Virtual Canada 1233\u20131248. DOI:10.1145\/3453483.3454106","DOI":"10.1145\/3453483.3454106"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3566054"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701033","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3701033","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:24Z","timestamp":1750295844000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701033"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,9]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1,31]]}},"alternative-id":["10.1145\/3701033"],"URL":"https:\/\/doi.org\/10.1145\/3701033","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2024,11,9]]},"assertion":[{"value":"2024-04-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}