{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T01:15:33Z","timestamp":1772846133762,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,28]],"date-time":"2022-06-28T00:00:00Z","timestamp":1656374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["CNS-1763658, CCF-2107470, CNS-1956007, CCF-2028861"],"award-info":[{"award-number":["CNS-1763658, CCF-2107470, CNS-1956007, CCF-2028861"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,28]]},"DOI":"10.1145\/3524059.3532369","type":"proceedings-article","created":{"date-parts":[[2022,6,16]],"date-time":"2022-06-16T16:13:11Z","timestamp":1655395991000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Dense dynamic blocks"],"prefix":"10.1145","author":[{"given":"Serif","family":"Yesil","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}]},{"given":"Jos\u00e9 E.","family":"Moreira","sequence":"additional","affiliation":[{"name":"IBM Research"}]},{"given":"Josep","family":"Torrellas","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}]}],"member":"320","published-online":{"date-parts":[[2022,6,28]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2015. Intel Math Kernel Library Inspector-executor Sparse BLAS Routines. https:\/\/software.intel.com\/en-us\/articles\/intel-math-kernel-library-inspector-executor-sparse-blas-routines  2015. Intel Math Kernel Library Inspector-executor Sparse BLAS Routines. https:\/\/software.intel.com\/en-us\/articles\/intel-math-kernel-library-inspector-executor-sparse-blas-routines"},{"key":"e_1_3_2_1_2_1","volume-title":"NVIDIA Tesla V100 GPU Architecture. https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf. [Online","year":"2021","unstructured":"2017. NVIDIA Tesla V100 GPU Architecture. https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf. [Online ; accessed 02- April - 2021 ]. 2017. NVIDIA Tesla V100 GPU Architecture. https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf. [Online; accessed 02-April-2021]."},{"key":"e_1_3_2_1_3_1","volume-title":"Intel Xeon Platinum 8268 Processor. https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/192481\/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html. [Online","year":"2021","unstructured":"2019. Intel Xeon Platinum 8268 Processor. https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/192481\/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html. [Online ; accessed 02- April - 2021 ]. 2019. Intel Xeon Platinum 8268 Processor. https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/192481\/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html. [Online; accessed 02-April-2021]."},{"key":"e_1_3_2_1_4_1","unstructured":"2020. Intel oneAPI Math Kernel Library. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/tools\/oneapi\/comp-onents\/onemkl  2020. Intel oneAPI Math Kernel Library. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/tools\/oneapi\/comp-onents\/onemkl"},{"key":"e_1_3_2_1_5_1","volume-title":"NVIDIA A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf. [Online","year":"2021","unstructured":"2020. NVIDIA A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf. [Online ; accessed 02- April - 2021 ]. 2020. NVIDIA A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf. [Online; accessed 02-April-2021]."},{"key":"e_1_3_2_1_6_1","unstructured":"2021. Intel Architecture Instruction Set Extensions Programming Reference. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/download\/intel-architecture-instruction-set-extensions-programming-reference.html. [Online; accessed 02-April-2021].  2021. Intel Architecture Instruction Set Extensions Programming Reference. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/download\/intel-architecture-instruction-set-extensions-programming-reference.html. [Online; accessed 02-April-2021]."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38750-0_12"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/SUPERC.1992.236712"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.125"},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the Symposium on High Performance Computing, HPC 2015","author":"Anzt Hartwig","year":"2015","unstructured":"Hartwig Anzt , Stanimire Tomov , and Jack J. Dongarra . 2015. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product . In Proceedings of the Symposium on High Performance Computing, HPC 2015 , part of the 2015 Spring Simulation Multiconference, SpringSim '15, Alexandria, VA, USA, April 12--15 , 2015, Layne T. Watson, Josef Weinbub, Masha Sosonkina, and William I. Thacker (Eds.). SCS\/ACM, 75--82. http:\/\/dl.acm.org\/citation.cfm?id=2872609 Hartwig Anzt, Stanimire Tomov, and Jack J. Dongarra. 2015. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product. In Proceedings of the Symposium on High Performance Computing, HPC 2015, part of the 2015 Spring Simulation Multiconference, SpringSim '15, Alexandria, VA, USA, April 12--15, 2015, Layne T. Watson, Josef Weinbub, Masha Sosonkina, and William I. Thacker (Eds.). SCS\/ACM, 75--82. http:\/\/dl.acm.org\/citation.cfm?id=2872609"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.110"},{"key":"e_1_3_2_1_12_1","volume-title":"https:\/\/www.redbooks.ibm.com\/redpieces\/pdfs\/redp5612.pdf. [Online","author":"Bhat Puneeth","year":"2021","unstructured":"Puneeth Bhat , Jose Moreira , and Satish Kumar Sadasivam . 2021. Matrix-Multiply Assist (MMA) Best Practices Guide . https:\/\/www.redbooks.ibm.com\/redpieces\/pdfs\/redp5612.pdf. [Online ; accessed 02- April - 2021 ]. Puneeth Bhat, Jose Moreira, and Satish Kumar Sadasivam. 2021. Matrix-Multiply Assist (MMA) Best Practices Guide. https:\/\/www.redbooks.ibm.com\/redpieces\/pdfs\/redp5612.pdf. [Online; accessed 02-April-2021]."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1583991.1584053"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342007083801"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693471"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414655"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3208040.3208062"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295712"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00075"},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Huang Guyue","year":"2020","unstructured":"Guyue Huang , Guohao Dai , Yu Wang , and Huazhong Yang . 2020 . GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis ( Atlanta, Georgia) (SC '20). IEEE Press, Article 72, 12 pages. Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. 2020. GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC '20). IEEE Press, Article 72, 12 pages."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437801.3441585"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041296"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374546"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2009.21"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSE.2009.223"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_3_2_1_29_1","volume-title":"Bishop","author":"Kreutzer Moritz","year":"2013","unstructured":"Moritz Kreutzer , Georg Hager , Gerhard Wellein , Holger Fehske , and Alan R . Bishop . 2013 . A unified sparse matrix data format for modern processors with wide SIMD units. CoRR abs\/1307.6209 (2013). arXiv:1307.6209 http:\/\/arxiv.org\/abs\/1307.6209 Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R. Bishop. 2013. A unified sparse matrix data format for modern processors with wide SIMD units. CoRR abs\/1307.6209 (2013). arXiv:1307.6209 http:\/\/arxiv.org\/abs\/1307.6209"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00091"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462181"},{"key":"e_1_3_2_1_32_1","volume-title":"Silvia Melitta Mueller, Brett Olsson, Satish Sadasivam, Baptiste Saleil, Bill Schmidt, Rajalakshmi Srinivasaraghavan, Shricharan Srivatsan, Brian W. Thompto, Andreas Wagner, and Nelson Wu.","author":"Moreira Jos\u00e9 E.","year":"2021","unstructured":"Jos\u00e9 E. Moreira , Kit Barton , Steven Battle , Peter Bergner , Ramon Bertran , Puneeth Bhat , Pedro Caldeira , David Edelsohn , Gordon Fossum , Brad Frey , Nemanja Ivanovic , Chip Kerchner , Vincent Lim , Shakti Kapoor , Tulio Machado Filho , Silvia Melitta Mueller, Brett Olsson, Satish Sadasivam, Baptiste Saleil, Bill Schmidt, Rajalakshmi Srinivasaraghavan, Shricharan Srivatsan, Brian W. Thompto, Andreas Wagner, and Nelson Wu. 2021 . A matrix math facility for Power ISA(TM) processors. CoRR abs\/2104.03142 (2021). arXiv:2104.03142 https:\/\/arxiv.org\/abs\/2104.03142 Jos\u00e9 E. Moreira, Kit Barton, Steven Battle, Peter Bergner, Ramon Bertran, Puneeth Bhat, Pedro Caldeira, David Edelsohn, Gordon Fossum, Brad Frey, Nemanja Ivanovic, Chip Kerchner, Vincent Lim, Shakti Kapoor, Tulio Machado Filho, Silvia Melitta Mueller, Brett Olsson, Satish Sadasivam, Baptiste Saleil, Bill Schmidt, Rajalakshmi Srinivasaraghavan, Shricharan Srivatsan, Brian W. Thompto, Andreas Wagner, and Nelson Wu. 2021. A matrix math facility for Power ISA(TM) processors. CoRR abs\/2104.03142 (2021). arXiv:2104.03142 https:\/\/arxiv.org\/abs\/2104.03142"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522739"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080254"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331562"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00017"},{"key":"e_1_3_2_1_38_1","volume-title":"SPARSEKIT: A Basic Toolkit for Sparse Matrix Computations. https:\/\/www-users.cs.umn.edu\/~saad\/PDF\/RIACS-90-20.pdf. [Online","author":"Saad Youcef","year":"2021","unstructured":"Youcef Saad . 2021 . SPARSEKIT: A Basic Toolkit for Sparse Matrix Computations. https:\/\/www-users.cs.umn.edu\/~saad\/PDF\/RIACS-90-20.pdf. [Online ; accessed 02-April-2021]. Youcef Saad. 2021. SPARSEKIT: A Basic Toolkit for Sparse Matrix Computations. https:\/\/www-users.cs.umn.edu\/~saad\/PDF\/RIACS-90-20.pdf. [Online; accessed 02-April-2021]."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751244"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442530"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3058632"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065944.1065981"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2002.10025"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/11557654_91"},{"key":"e_1_3_2_1_45_1","volume-title":"Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. CoRR abs\/1909.01315","author":"Wang Minjie","year":"2019","unstructured":"Minjie Wang , Lingfan Yu , Da Zheng , Quan Gan , Yu Gai , Zihao Ye , Mufei Li , Jinjing Zhou , Qi Huang , Chao Ma , Ziyue Huang , Qipeng Guo , Hao Zhang , Haibin Lin , Junbo Zhao , Jinyang Li , Alexander J. Smola , and Zheng Zhang . 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. CoRR abs\/1909.01315 ( 2019 ). arXiv:1909.01315 http:\/\/arxiv.org\/abs\/1909.01315 Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J. Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. CoRR abs\/1909.01315 (2019). arXiv:1909.01315 http:\/\/arxiv.org\/abs\/1909.01315"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915220"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330354"},{"key":"e_1_3_2_1_48_1","unstructured":"Takuma Yamaguchi and Federico Busato. 2021. Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores. https:\/\/developer.nvidia.com\/blog\/accelerating-matrix-multiplication-with-block-sparse-format-and-nvidia-tensor-cores\/. [Online; accessed 02-April-2021].  Takuma Yamaguchi and Federico Busato. 2021. Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores. https:\/\/developer.nvidia.com\/blog\/accelerating-matrix-multiplication-with-block-sparse-format-and-nvidia-tensor-cores\/. [Online; accessed 02-April-2021]."},{"key":"e_1_3_2_1_49_1","volume-title":"Owens","author":"Yang Carl","year":"2018","unstructured":"Carl Yang , Aydin Bulu\u00e7 , and John D . Owens . 2018 . Design Principles for Sparse Matrix Multiplication on the GPU. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.). Springer International Publishing , Cham, 672--687. Carl Yang, Aydin Bulu\u00e7, and John D. Owens. 2018. Design Principles for Sparse Matrix Multiplication on the GPU. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.). Springer International Publishing, Cham, 672--687."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00090"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851500"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2591635.2667180"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2004.1342561"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2020.106848"}],"event":{"name":"ICS '22: 2022 International Conference on Supercomputing","location":"Virtual Event","acronym":"ICS '22","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 36th ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532369","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524059.3532369","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524059.3532369","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:37Z","timestamp":1750188637000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532369"}},"subtitle":["optimizing SpMM for processors with vector and matrix units using machine learning techniques"],"short-title":[],"issued":{"date-parts":[[2022,6,28]]},"references-count":54,"alternative-id":["10.1145\/3524059.3532369","10.1145\/3524059"],"URL":"https:\/\/doi.org\/10.1145\/3524059.3532369","relation":{},"subject":[],"published":{"date-parts":[[2022,6,28]]},"assertion":[{"value":"2022-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}