{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T15:02:27Z","timestamp":1773414147663,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":59,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,4,17]],"date-time":"2021-04-17T00:00:00Z","timestamp":1618617600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"DARPA SDH","award":["HR0011-18-3-0007"],"award-info":[{"award-number":["HR0011-18-3-0007"]}]},{"name":"Semiconductor Research Corporation","award":["2020-AH-2985"],"award-info":[{"award-number":["2020-AH-2985"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,19]]},"DOI":"10.1145\/3445814.3446702","type":"proceedings-article","created":{"date-parts":[[2021,4,11]],"date-time":"2021-04-11T17:06:26Z","timestamp":1618160786000},"page":"687-701","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":137,"title":["Gamma: leveraging Gustavson\u2019s algorithm to accelerate sparse matrix multiplication"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1034-2306","authenticated-orcid":false,"given":"Guowei","family":"Zhang","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9683-914X","authenticated-orcid":false,"given":"Nithya","family":"Attaluri","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3459-5466","authenticated-orcid":false,"given":"Joel S.","family":"Emer","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2453-2904","authenticated-orcid":false,"given":"Daniel","family":"Sanchez","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,4,17]]},"reference":[{"key":"e_1_3_2_1_1_1","series-title":"SIAM Journal on Scientific Computing, 38 ( 6 )","volume-title":"Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication","author":"Azad Ariful","year":"2016","unstructured":"Ariful Azad , Grey Ballard , Aydin Buluc , James Demmel , Laura Grigori , Oded Schwartz , Sivan Toledo , and Samuel Williams . Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication . SIAM Journal on Scientific Computing, 38 ( 6 ) , 2016 . Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication. SIAM Journal on Scientific Computing, 38 ( 6 ), 2016."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2015.75"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085572"},{"key":"e_1_3_2_1_4_1","volume-title":"Hypergraph partitioning for sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC), 3 ( 3 )","author":"Ballard Grey","year":"2016","unstructured":"Grey Ballard , Alex Druinsky , Nicholas Knight , and Oded Schwartz . Hypergraph partitioning for sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC), 3 ( 3 ) , 2016 . Grey Ballard, Alex Druinsky, Nicholas Knight, and Oded Schwartz. Hypergraph partitioning for sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC), 3 ( 3 ), 2016."},{"key":"e_1_3_2_1_5_1","series-title":"SIAM Journal on Scientific Computing, 34 ( 4 )","volume-title":"Exposing fine-grained parallelism in algebraic multigrid methods","author":"Bell Nathan","year":"2012","unstructured":"Nathan Bell , Steven Dalton , and Luke N Olson . Exposing fine-grained parallelism in algebraic multigrid methods . SIAM Journal on Scientific Computing, 34 ( 4 ) , 2012 . Nathan Bell, Steven Dalton, and Luke N Olson. Exposing fine-grained parallelism in algebraic multigrid methods. SIAM Journal on Scientific Computing, 34 ( 4 ), 2012."},{"key":"e_1_3_2_1_6_1","volume-title":"Alessandro De Vita, and Roberto Car. O(N) tight-binding molecular dynamics on massively parallel computers: An orbital decomposition approach. Computer Physics Communications, 94 ( 2-3 )","author":"Canning Andrew","year":"1996","unstructured":"Andrew Canning , Giulia Galli , Francesco Mauri , Alessandro De Vita, and Roberto Car. O(N) tight-binding molecular dynamics on massively parallel computers: An orbital decomposition approach. Computer Physics Communications, 94 ( 2-3 ) , 1996 . Andrew Canning, Giulia Galli, Francesco Mauri, Alessandro De Vita, and Roberto Car. O(N) tight-binding molecular dynamics on massively parallel computers: An orbital decomposition approach. Computer Physics Communications, 94 ( 2-3 ), 1996."},{"key":"e_1_3_2_1_7_1","series-title":"SIAM Journal on Computing, 39 ( 5 )","volume-title":"More algorithms for all-pairs shortest paths in weighted graphs","author":"Chan Timothy M","year":"2010","unstructured":"Timothy M Chan . More algorithms for all-pairs shortest paths in weighted graphs . SIAM Journal on Computing, 39 ( 5 ) , 2010 . Timothy M Chan. More algorithms for all-pairs shortest paths in weighted graphs. SIAM Journal on Computing, 39 ( 5 ), 2010."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the 43rd annual International Symposium on Computer Architecture (ISCA-43)","author":"Chen Yu-Hsin","year":"2016","unstructured":"Yu-Hsin Chen , Joel Emer , and Vivienne Sze . Eyeriss : A spatial architecture for energy-eficient dataflow for convolutional neural networks . In Proceedings of the 43rd annual International Symposium on Computer Architecture (ISCA-43) , 2016 . Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architecture for energy-eficient dataflow for convolutional neural networks. In Proceedings of the 43rd annual International Symposium on Computer Architecture (ISCA-43), 2016."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557049"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385963"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/800195.805928"},{"key":"e_1_3_2_1_13_1","volume-title":"Optimizing sparse matrix-matrix multiplication for the GPU. ACM Transactions on Mathematical Software (TOMS), 41 ( 4 )","author":"Dalton Steven","year":"2015","unstructured":"Steven Dalton , Luke Olson , and Nathan Bell . Optimizing sparse matrix-matrix multiplication for the GPU. ACM Transactions on Mathematical Software (TOMS), 41 ( 4 ) , 2015 . Steven Dalton, Luke Olson, and Nathan Bell. Optimizing sparse matrix-matrix multiplication for the GPU. ACM Transactions on Mathematical Software (TOMS), 41 ( 4 ), 2015."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939862"},{"key":"e_1_3_2_1_15_1","series-title":"SIAM Journal on Matrix Analysis and Applications, 13 ( 1 )","volume-title":"Sparse matrices in MATLAB: Design and implementation","author":"Gilbert John R","year":"1992","unstructured":"John R Gilbert , Cleve Moler , and Robert Schreiber . Sparse matrices in MATLAB: Design and implementation . SIAM Journal on Matrix Analysis and Applications, 13 ( 1 ) , 1992 . John R Gilbert, Cleve Moler, and Robert Schreiber. Sparse matrices in MATLAB: Design and implementation. SIAM Journal on Matrix Analysis and Applications, 13 ( 1 ), 1992."},{"key":"e_1_3_2_1_16_1","volume-title":"International Workshop on Applied Parallel Computing","author":"Gilbert John R","year":"2006","unstructured":"John R Gilbert , Steve Reinhardt , and Viral B Shah . High-performance graph algorithms from parallel sparse matrices . In International Workshop on Applied Parallel Computing , 2006 . John R Gilbert, Steve Reinhardt, and Viral B Shah. High-performance graph algorithms from parallel sparse matrices. In International Workshop on Applied Parallel Computing, 2006."},{"key":"e_1_3_2_1_17_1","volume-title":"Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS), 4 ( 3 )","author":"Gustavson Fred G","year":"1978","unstructured":"Fred G Gustavson . Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS), 4 ( 3 ) , 1978 . Fred G Gustavson. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS), 4 ( 3 ), 1978."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR)","author":"Han Song","year":"2016","unstructured":"Song Han , Huizi Mao , and William J Dally . Deep compression : Compressing deep neural networks with pruning, trained quantization and Hufman coding . In Proceedings of the 4th International Conference on Learning Representations (ICLR) , 2016 . Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and Hufman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR), 2016."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358275"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00062"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295712"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815971"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374546"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358286"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1137856.1137866"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/602770.602808"},{"key":"e_1_3_2_1_27_1","first-page":"51","author":"Kepner Jeremy","year":"2015","unstructured":"Jeremy Kepner , David Bader , Ayd\u00fdn Bulu\u00e7 , John Gilbert , Timothy Mattson , and Henning Meyerhenke . Graphs, matrices, and the GraphBLAS: Seven good reasons. Procedia Computer Science , 51 , 2015 . Jeremy Kepner, David Bader, Ayd\u00fdn Bulu\u00e7, John Gilbert, Timothy Mattson, and Henning Meyerhenke. Graphs, matrices, and the GraphBLAS: Seven good reasons. Procedia Computer Science, 51, 2015.","journal-title":"Procedia Computer Science"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661185"},{"key":"e_1_3_2_1_29_1","volume-title":"Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)","author":"Kjolstad Fredrik","year":"2018","unstructured":"Fredrik Kjolstad , Shoaib Kamil , Stephen Chou , David Lugato , and Saman Amarasinghe . The tensor algebra compiler . In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) , 2018 . Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. The tensor algebra compiler. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2018."},{"key":"e_1_3_2_1_30_1","volume-title":"The SuiteSparse matrix collection website interface. Journal of Open Source Software, 4 ( 35 )","author":"Kolodziej Scott P","year":"2019","unstructured":"Scott P Kolodziej , Mohsen Aznaveh , Matthew Bullock , Jarrett David , Timothy A Davis , Matthew Henderson , Yifan Hu , and Read Sandstrom . The SuiteSparse matrix collection website interface. Journal of Open Source Software, 4 ( 35 ) , 2019 . Scott P Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A Davis, Matthew Henderson, Yifan Hu, and Read Sandstrom. The SuiteSparse matrix collection website interface. Journal of Open Source Software, 4 ( 35 ), 2019."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750374"},{"key":"e_1_3_2_1_32_1","volume-title":"Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Liu Weifeng","year":"2014","unstructured":"Weifeng Liu and Brian Vinter . An eficient GPU general sparse matrix-matrix multiplication for irregular data . In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS) , 2014 . Weifeng Liu and Brian Vinter. An eficient GPU general sparse matrix-matrix multiplication for irregular data. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2014."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358254"},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the 47th International Conference on Parallel Processing","author":"Nagasaka Yusuke","year":"2018","unstructured":"Yusuke Nagasaka , Satoshi Matsuoka , Ariful Azad , and Ayd\u00fdn Bulu\u00e7 . Highperformance sparse matrix-matrix products on Intel KNL and multicore architectures . In Proceedings of the 47th International Conference on Parallel Processing , 2018 . Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Ayd\u00fdn Bulu\u00e7. Highperformance sparse matrix-matrix products on Intel KNL and multicore architectures. In Proceedings of the 47th International Conference on Parallel Processing, 2018."},{"key":"e_1_3_2_1_35_1","unstructured":"NanGate Inc. The NanGate 45nm open cell library. http:\/\/www.nangate.com\/ ?page_id= 2325 2008.  NanGate Inc. The NanGate 45nm open cell library. http:\/\/www.nangate.com\/ ?page_id= 2325 2008."},{"key":"e_1_3_2_1_36_1","volume-title":"GPU Technology Conference","author":"Naumov Maxim","year":"2010","unstructured":"Maxim Naumov , Lung-Sheng Chien , Philippe Vandermersch , and Ujval Kapasi . CUSPARSE library . In GPU Technology Conference , 2010 . Maxim Naumov, Lung-Sheng Chien, Philippe Vandermersch, and Ujval Kapasi. CUSPARSE library. In GPU Technology Conference, 2010."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00042"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080254"},{"key":"e_1_3_2_1_40_1","volume-title":"Proceedings of the 24th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXIV)","author":"Pellauer Michael","year":"2019","unstructured":"Michael Pellauer , Yakun Sophia Shao , Jason Clemons , Neal Crago , Kartik Hegde , Rangharajan Venkatesan , Stephen W Keckler , Christopher W Fletcher , and Joel Emer . Bufets : An eficient and composable storage idiom for explicit decoupled data orchestration . In Proceedings of the 24th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXIV) , 2019 . Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. Bufets: An eficient and composable storage idiom for explicit decoupled data orchestration. In Proceedings of the 24th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXIV), 2019."},{"key":"e_1_3_2_1_41_1","volume-title":"Eficient transitive closure of sparse matrices over closed semirings. Theoretical Computer Science, 354 ( 1 )","author":"Penn Gerald","year":"2006","unstructured":"Gerald Penn . Eficient transitive closure of sparse matrices over closed semirings. Theoretical Computer Science, 354 ( 1 ) , 2006 . Gerald Penn. Eficient transitive closure of sparse matrices over closed semirings. Theoretical Computer Science, 354 ( 1 ), 2006."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331562"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00015"},{"key":"e_1_3_2_1_44_1","volume-title":"Maximum matchings in general graphs through randomization. Journal of Algorithms, 10 ( 4 )","author":"Rabin Michael O","year":"1989","unstructured":"Michael O Rabin and Vijay V Vazirani . Maximum matchings in general graphs through randomization. Journal of Algorithms, 10 ( 4 ) , 1989 . Michael O Rabin and Vijay V Vazirani. Maximum matchings in general graphs through randomization. Journal of Algorithms, 10 ( 4 ), 1989."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358330"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2012.2193936"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355396"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00068"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-01766-7"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/868979"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/16\/1\/071"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-06486-4_7"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915220"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157329"},{"key":"e_1_3_2_1_55_1","volume-title":"Proceedings of the 21st Austrian Workshop on Microelectronics (Austrochip)","author":"Wolf Cliford","year":"2012","unstructured":"Cliford Wolf , Johann Glaser , and Johannes Kepler . Yosys-a free Verilog synthesis suite . In Proceedings of the 21st Austrian Workshop on Microelectronics (Austrochip) , 2012 . Cliford Wolf, Johann Glaser, and Johannes Kepler. Yosys-a free Verilog synthesis suite. In Proceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), 2012."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330354"},{"key":"e_1_3_2_1_57_1","volume-title":"International Conference on High Performance Computing for Computational Science","author":"Yamazaki Ichitaro","year":"2010","unstructured":"Ichitaro Yamazaki and Xiaoye S Li . On techniques to improve robustness and scalability of a parallel hybrid linear solver . In International Conference on High Performance Computing for Computational Science , 2010 . Ichitaro Yamazaki and Xiaoye S Li. On techniques to improve robustness and scalability of a parallel hybrid linear solver. In International Conference on High Performance Computing for Computational Science, 2010."},{"key":"e_1_3_2_1_58_1","volume-title":"Proceedings of the 15th annual ACM-SIAM Symposium On Discrete Algorithms (SODA)","author":"Yuster Raphael","year":"2004","unstructured":"Raphael Yuster and Uri Zwick . Detecting short directed cycles using rectangular matrix multiplication and dynamic programming . In Proceedings of the 15th annual ACM-SIAM Symposium On Discrete Algorithms (SODA) , 2004 . Raphael Yuster and Uri Zwick. Detecting short directed cycles using rectangular matrix multiplication and dynamic programming. In Proceedings of the 15th annual ACM-SIAM Symposium On Discrete Algorithms (SODA), 2004."},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00030"}],"event":{"name":"ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems","location":"Virtual USA","acronym":"ASPLOS '21","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3445814.3446702","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3445814.3446702","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:14Z","timestamp":1750195694000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3445814.3446702"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,17]]},"references-count":59,"alternative-id":["10.1145\/3445814.3446702","10.1145\/3445814"],"URL":"https:\/\/doi.org\/10.1145\/3445814.3446702","relation":{},"subject":[],"published":{"date-parts":[[2021,4,17]]},"assertion":[{"value":"2021-04-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}