{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T05:18:55Z","timestamp":1768108735024,"version":"3.49.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,11,17]],"date-time":"2022-11-17T00:00:00Z","timestamp":1668643200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>GraphBLASis a recent standard that allows the expression of graph algorithms in the language of linear algebra and enables automatic code parallelization and optimization. GraphBLAS operations are memory bound and may benefit from data locality optimizations enabled by nonblocking execution. However, nonblocking execution remains under-evaluated. In this article, we present a novel design and implementation that investigates nonblocking execution in GraphBLAS. Lazy evaluation enables runtime optimizations that improve data locality, and dynamic data dependence analysis identifies operations that may reuse data in cache. The nonblocking execution of an arbitrary number of operations results in dynamic parallelism, and the performance of the nonblocking execution depends on two parameters, which are automatically determined, at run-time, based on a proposed analytic model. The evaluation confirms the importance of nonblocking execution for various matrices of three algorithms, by showing up to 4.11\u00d7 speedup over blocking execution as a result of better cache utilization. The proposed analytic model makes the nonblocking execution reach up to 5.13\u00d7 speedup over the blocking execution. The fully automatic performance is very close to that obtained by using the best manual configuration for both small and large matrices. Finally, the evaluation includes a comparison with other state-of-the-art frameworks for numerical linear algebra programming that employ parallel execution and similar optimizations to those discussed in this work, and the presented nonblocking execution reaches up to 16.1\u00d7 speedup over the state of the art.<\/jats:p>","DOI":"10.1145\/3561652","type":"journal-article","created":{"date-parts":[[2022,9,6]],"date-time":"2022-09-06T11:51:59Z","timestamp":1662465119000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Design and Implementation for Nonblocking Execution in GraphBLAS: Tradeoffs and Performance"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5235-8499","authenticated-orcid":false,"given":"Aristeidis","family":"Mastoras","sequence":"first","affiliation":[{"name":"Computing Systems Laboratory, Zurich Research Center, Huawei Technologies Switzerland AG, Z\u00fcrich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8012-3331","authenticated-orcid":false,"given":"Sotiris","family":"Anagnostidis","sequence":"additional","affiliation":[{"name":"Department of Computer Science, ETH Zurich, Z\u00fcrich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8842-3689","authenticated-orcid":false,"given":"Albert-Jan N.","family":"Yzelman","sequence":"additional","affiliation":[{"name":"Computing Systems Laboratory, Zurich Research Center, Huawei Technologies Switzerland AG, Z\u00fcrich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,17]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2021. ALP\/GraphBLAS. Retrieved on June 22 2022. https:\/\/github.com\/Algebraic-Programming\/ALP."},{"key":"e_1_3_1_3_2","unstructured":"2022. Searchable Abstracts Document Conference on Parallel Processing for Scientific Computing February 23\u201326 2022. Retrieved on August 17 2022. https:\/\/www.siam.org\/Portals\/0\/Conferences\/PP22\/PP22_ABSTRACTS.pdf."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.5555\/578775"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611976229.14"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/197405.197406"},{"key":"e_1_3_1_7_2","first-page":"291","volume-title":"Runtime Code Generation in C++ as a Foundation for Domain-Specific Optimisation","author":"Beckmann Olav","year":"2004","unstructured":"Olav Beckmann, Alastair Houghton, Michael Mellor, and Paul H. J. Kelly. 2004. Runtime Code Generation in C++ as a Foundation for Domain-Specific Optimisation. Springer, Berlin, 291\u2013306."},{"key":"e_1_3_1_8_2","unstructured":"Benjamin Brock Ayd\u0131n Bulu\u00e7 Timothy Mattson Scott McMillan and Jose Moreira. 2021. The GraphBLAS C API Specification version 2.0. http:\/\/www.graphblas.org."},{"key":"e_1_3_1_9_2","unstructured":"Ayd\u0131n Bulu\u00e7 Timothy Mattson Scott McMillan Jose Moreira and Carl Yang. 2017. The GraphBLAS C API Specification version 1.0. http:\/\/www.graphblas.org."},{"key":"e_1_3_1_10_2","first-page":"310","volume-title":"2018 IEEE International Parallel and Distributed Processing Symposium Workshops","author":"Chamberlin Jesse","year":"2018","unstructured":"Jesse Chamberlin, Marcin Zalewski, Scott McMillan, and Andrew Lumsdaine. 2018. PyGB: GraphBLAS DSL in Python with dynamic compilation into efficient C++. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops. 310\u2013319."},{"key":"e_1_3_1_11_2","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1007\/978-3-540-85261-2_12","volume-title":"Languages and Compilers for Parallel Computing","author":"Cornwall Jay L. T.","year":"2008","unstructured":"Jay L. T. Cornwall, Paul H. J. Kelly, Phil Parsonage, and Bruno Nicoletti. 2008. Explicit dependence metadata in an active visual effects library. In Languages and Compilers for Parallel Computing. 172\u2013186."},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1007\/3-540-56891-3_12","volume-title":"PARLE\u201993 Parallel Architectures and Languages Europe","author":"Darlington J.","year":"1993","unstructured":"J. Darlington, A. J. Field, P. G. Harrison, P. H. J. Kelly, D. W. N. Sharp, Q. Wu, and R. L. While. 1993. Parallel programming using skeleton functions. In PARLE\u201993 Parallel Architectures and Languages Europe. Springer, Berlin, 146\u2013160."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3322125"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2019.8916550"},{"issue":"1","key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2049662.2049663","article-title":"The University of Florida sparse matrix collection","volume":"38","author":"Davis Timothy A.","year":"2011","unstructured":"Timothy A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Software 38, 1, Article 1 (Nov. 2011), 1:1\u20131:25 pages.","journal-title":"ACM Trans. Math. Software"},{"key":"e_1_3_1_16_2","volume-title":"GNU Scientific Library Reference Manual (3rd Ed.)","author":"Galassi M.","year":"2009","unstructured":"M. Galassi et\u00a0al. 2009. GNU Scientific Library Reference Manual (3rd Ed.). http:\/\/www.gnu.org\/software\/gsl\/."},{"key":"e_1_3_1_17_2","unstructured":"Ga\u00ebl Guennebaud et\u00a0al. 2010. Eigen v3. http:\/\/eigen.tuxfamily.org."},{"key":"e_1_3_1_18_2","first-page":"981","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Heinecke Alexander","year":"2016","unstructured":"Alexander Heinecke, Greg Henry, Maxwell Hutchinson, and Hans Pabst. 2016. LIBXSMM: Accelerating small matrix multiplications by runtime code generation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 981\u2013991."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2019.8916336"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/2039367"},{"key":"e_1_3_1_21_2","volume-title":"Google\u2019s PageRank and Beyond","author":"Langville Amy N.","year":"2011","unstructured":"Amy N. Langville and Carl D. Meyer. 2011. Google\u2019s PageRank and Beyond. Princeton University Press."},{"key":"e_1_3_1_22_2","article-title":"SNAP Datasets: Stanford Large Network Dataset Collection","author":"Leskovec Jure","year":"2014","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data.","journal-title":"http:\/\/snap.stanford.edu\/data"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW55747.2022.00051"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3363815"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307411"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2017.8091095"},{"key":"e_1_3_1_27_2","first-page":"298","volume-title":"2018 IEEE International Parallel and Distributed Processing Symposium Workshops","author":"Moreira Jos\u00e9 E.","year":"2018","unstructured":"Jos\u00e9 E. Moreira, Manoj Kumar, and William P. Horn. 2018. Implementing the GraphBLAS C API. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops. 298\u2013309."},{"key":"e_1_3_1_28_2","unstructured":"OpenMP Architecture Review Board Version 5.2. 2021. OpenMP Application Program Interface. Retrieved on April 22 2022. https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5-2.pdf."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.scico.2008.06.002"},{"key":"e_1_3_1_30_2","unstructured":"Wijnand Suijlen and A. N. Yzelman. 2019. Lightweight Parallel Foundations: A model-compliant communication layer. arxiv:1906.03196 [cs.DC]"},{"key":"e_1_3_1_31_2","article-title":"GraphBLAST: A high-performance linear algebra-based graph framework on the GPU","author":"Yang Carl","year":"2019","unstructured":"Carl Yang, Aydin Buluc, and John D. Owens. 2019. GraphBLAST: A high-performance linear algebra-based graph framework on the GPU. arXiv preprint arXiv:1908.01407 (2019).","journal-title":"arXiv preprint arXiv:1908.01407"},{"key":"e_1_3_1_32_2","unstructured":"A. N. Yzelman D. Di Nardo J. M. Nash and W. J. Suijlen. 2020. A C++ GraphBLAS: specification implementation parallelisation and evaluation. (2020). http:\/\/albert-jan.yzelman.net\/PDFs\/yzelman20.pdf. Preprint."},{"key":"e_1_3_1_33_2","first-page":"912","volume-title":"2016 IEEE International Parallel and Distributed Processing Symposium Workshops","author":"Zhang Peter","year":"2016","unstructured":"Peter Zhang, Marcin Zalewski, Andrew Lumsdaine, Samantha Misurda, and Scott McMillan. 2016. GBTL-CUDA: Graph algorithms and primitives for GPUs. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 912\u2013920."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3561652","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3561652","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:24Z","timestamp":1750186944000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3561652"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,17]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3561652"],"URL":"https:\/\/doi.org\/10.1145\/3561652","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,17]]},"assertion":[{"value":"2022-07-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-08-31","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}