{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T14:57:42Z","timestamp":1776956262558,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":60,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,11,17]],"date-time":"2019-11-17T00:00:00Z","timestamp":1573948800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Platform for Advanced Scientific Computing (PASC)"},{"name":"European Research Council (ERC)","award":["678880"],"award-info":[{"award-number":["678880"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,11,17]]},"DOI":"10.1145\/3295500.3356181","type":"proceedings-article","created":{"date-parts":[[2019,11,7]],"date-time":"2019-11-07T19:43:22Z","timestamp":1573155802000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":85,"title":["Red-blue pebbling revisited"],"prefix":"10.1145","author":[{"given":"Grzegorz","family":"Kwasniewski","sequence":"first","affiliation":[{"name":"ETH Zurich"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marko","family":"Kabi\u0107","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maciej","family":"Besta","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joost","family":"VandeVondele","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raffaele","family":"Solc\u00e0","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Torsten","family":"Hoefler","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,11,17]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"R. C. Agarwal etal 1995. A three-dimensional approach to parallel matrix multiplication. IBM J. Res. Dev. (1995).  R. C. Agarwal et al. 1995. A three-dimensional approach to parallel matrix multiplication. IBM J. Res. Dev. (1995).","DOI":"10.1147\/rd.395.0575"},{"key":"e_1_3_2_1_2_1","unstructured":"Alok Aggarwal and S. Vitter Jeffrey. [n. d.]. The Input\/Output Complexity of Sorting and Related Problems. CACM (Sept. [n. d.]).  Alok Aggarwal and S. Vitter Jeffrey. [n. d.]. The Input\/Output Complexity of Sorting and Related Problems. CACM (Sept. [n. d.])."},{"key":"e_1_3_2_1_3_1","unstructured":"Lars Arge et al. 2008. In SPAA.  Lars Arge et al. 2008. In SPAA."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Ariful Azad et al. 2015. Parallel triangle counting and enumeration using matrix algebra. In IPDPSW.  Ariful Azad et al. 2015. Parallel triangle counting and enumeration using matrix algebra. In IPDPSW.","DOI":"10.1109\/IPDPSW.2015.75"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Grey Ballard et al. 2012. Graph expansion and communication costs of fast matrix multiplication. JACM (2012).  Grey Ballard et al. 2012. Graph expansion and communication costs of fast matrix multiplication. JACM (2012).","DOI":"10.1145\/1989493.1989495"},{"key":"e_1_3_2_1_6_1","unstructured":"Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. CoRR abs\/1802.09941 (2018).  Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. CoRR abs\/1802.09941 (2018)."},{"key":"e_1_3_2_1_7_1","unstructured":"Michael A Bender etal 2010. Optimal sparse matrix dense vector multiplication in the I\/O-model. TOCS (2010).  Michael A Bender et al. 2010. Optimal sparse matrix dense vector multiplication in the I\/O-model. TOCS (2010)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Maciej Besta et al. 2017. SlimSell: A Vectorizable Graph Representation for Breadth-First Search. In IPDPS.  Maciej Besta et al. 2017. SlimSell: A Vectorizable Graph Representation for Breadth-First Search. In IPDPS.","DOI":"10.1109\/IPDPS.2017.93"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Robert D Blumofe etal 1996. An analysis of dag-consistent distributed shared-memory algorithms. In SPAA.  Robert D Blumofe et al. 1996. An analysis of dag-consistent distributed shared-memory algorithms. In SPAA.","DOI":"10.1145\/237502.237574"},{"key":"e_1_3_2_1_10_1","unstructured":"Lynn Elliot Cannon. 1969. A Cellular Computer to Implement the Kalman Filter Algorithm. Ph.D. Dissertation.  Lynn Elliot Cannon. 1969. A Cellular Computer to Implement the Kalman Filter Algorithm. Ph.D. Dissertation."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Gregory J. Chaitin et al. 1981. Register allocation via coloring. Computer Languages (1981).  Gregory J. Chaitin et al. 1981. Register allocation via coloring. Computer Languages (1981).","DOI":"10.1016\/0096-0551(81)90048-5"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"S. M. Chan. 2013. Just a Pebble Game. In CCC.  S. M. Chan. 2013. Just a Pebble Game. In CCC.","DOI":"10.1109\/CCC.2013.22"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Fran\u00e0oise Chatelin. 2012. Eigenvalues of Matrices: Revised Edition. Siam.  Fran\u00e0oise Chatelin. 2012. Eigenvalues of Matrices: Revised Edition. Siam.","DOI":"10.1137\/1.9781611972467"},{"key":"e_1_3_2_1_14_1","unstructured":"Jaeyoung Choi et al. 1992. ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers. In FRONTIERS.  Jaeyoung Choi et al. 1992. ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers. In FRONTIERS."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Jaeyoung Choi et al. 1996. Design and Implementation of the ScaLAPACK LU QR and Cholesky Factorization Routines. Sci. Program. (1996).  Jaeyoung Choi et al. 1996. Design and Implementation of the ScaLAPACK LU QR and Cholesky Factorization Routines. Sci. Program. (1996).","DOI":"10.1155\/1996\/483083"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"J. Choi et al. 1996. ScaLAPACK: a portable linear algebra library for distributed memory computers --- design issues and performance. Comp. Phys. Comm. (1996).  J. Choi et al. 1996. ScaLAPACK: a portable linear algebra library for distributed memory computers --- design issues and performance. Comp. Phys. Comm. (1996).","DOI":"10.1007\/3-540-60902-4_12"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4330060702"},{"key":"e_1_3_2_1_18_1","unstructured":"Thomas H Cormen Charles E Leiserson Ronald L Rivest and Clifford Stein. 2009. Introduction to algorithms. MIT press.  Thomas H Cormen Charles E Leiserson Ronald L Rivest and Clifford Stein. 2009. Introduction to algorithms. MIT press."},{"key":"e_1_3_2_1_19_1","volume-title":"High-Performance Computing","author":"D'Alberto Paolo"},{"key":"e_1_3_2_1_20_1","unstructured":"Alain Darte. 1999. On the complexity of loop fusion. In PACT.  Alain Darte. 1999. On the complexity of loop fusion. In PACT."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Mauro Del Ben etal 2015. Enabling simulation at the fifth rung of DFT: Large scale RPA calculations with excellent time to solution. Comp. Phys. Comm. (2015).  Mauro Del Ben et al. 2015. Enabling simulation at the fifth rung of DFT: Large scale RPA calculations with excellent time to solution. Comp. Phys. Comm. (2015).","DOI":"10.1016\/j.cpc.2014.10.021"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"J. Demmel et al. 2013. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In IPDPS.  J. Demmel et al. 2013. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In IPDPS.","DOI":"10.1109\/IPDPS.2013.80"},{"key":"e_1_3_2_1_23_1","unstructured":"V. Elango et al. 2013. Data access complexity: The red\/blue pebble game revisited. Technical Report.  V. Elango et al. 2013. Data access complexity: The red\/blue pebble game revisited. Technical Report."},{"key":"e_1_3_2_1_24_1","unstructured":"Geoffrey C Fox. 1988. Solving problems on concurrent processors. (1988).  Geoffrey C Fox. 1988. Solving problems on concurrent processors. (1988)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Geoffrey C Fox Steve W Otto and Anthony JG Hey. 1987. Matrix algorithms on a hypercube I: Matrix multiplication. Parallel computing 4 1 (1987) 17--31.  Geoffrey C Fox Steve W Otto and Anthony JG Hey. 1987. Matrix algorithms on a hypercube I: Matrix multiplication. Parallel computing 4 1 (1987) 17--31.","DOI":"10.1016\/0167-8191(87)90060-3"},{"key":"e_1_3_2_1_26_1","unstructured":"M. Frigo et al. 1999. Cache-oblivious algorithms. In FOCS.  M. Frigo et al. 1999. Cache-oblivious algorithms. In FOCS."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"R. Gerstenberger et al. 2013. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided. In SC.  R. Gerstenberger et al. 2013. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided. In SC.","DOI":"10.1145\/2503210.2503286"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"John R. Gilbert et al. 1979. The Pebbling Problem is Complete in Polynomial Space. In STOC.  John R. Gilbert et al. 1979. The Pebbling Problem is Complete in Polynomial Space. In STOC.","DOI":"10.1145\/800135.804418"},{"key":"e_1_3_2_1_29_1","unstructured":"Gero Greiner. 2012. Sparse matrix computations and their I\/O complexity. Ph.D. Dissertation. Technische Universit\u00e4t M\u00fcnchen.  Gero Greiner. 2012. Sparse matrix computations and their I\/O complexity. Ph.D. Dissertation. Technische Universit\u00e4t M\u00fcnchen."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"A. Gupta and V. Kumar. 1993. Scalability of Parallel Algorithms for Matrix Multiplication. In ICPP.  A. Gupta and V. Kumar. 1993. Scalability of Parallel Algorithms for Matrix Multiplication. In ICPP.","DOI":"10.1109\/ICPP.1993.160"},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 47","author":"Haidar Azzam","year":"2018"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"T. Hoefler et al. 2015. Remote Memory Access Programming in MPI-3. TOPC (2015).  T. Hoefler et al. 2015. Remote Memory Access Programming in MPI-3. TOPC (2015).","DOI":"10.1145\/2780584"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Dror Irony et al. 2004. Communication Lower Bounds for Distributed-memory Matrix Multiplication. JPDC (2004).  Dror Irony et al. 2004. Communication Lower Bounds for Distributed-memory Matrix Multiplication. JPDC (2004).","DOI":"10.1016\/j.jpdc.2004.03.021"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Hong Jia-Wei and Hsiang-Tsung Kung. 1981. I\/O complexity: The red-blue pebble game. In STOC.  Hong Jia-Wei and Hsiang-Tsung Kung. 1981. I\/O complexity: The red-blue pebble game. In STOC.","DOI":"10.1145\/800076.802486"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Ken Kennedy and Kathryn S McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In LCPC.  Ken Kennedy and Kathryn S McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In LCPC.","DOI":"10.1007\/3-540-57659-2_18"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Jeremy Kepner et al. 2016. Mathematical foundations of the GraphBLAS. arXiv:1606.05790 (2016).  Jeremy Kepner et al. 2016. Mathematical foundations of the GraphBLAS. arXiv:1606.05790 (2016).","DOI":"10.1109\/HPEC.2016.7761646"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Penporn Koanantakool et al. 2016. Communication-avoiding parallel sparse-dense matrix-matrix multiplication. In IPDPS.  Penporn Koanantakool et al. 2016. Communication-avoiding parallel sparse-dense matrix-matrix multiplication. In IPDPS.","DOI":"10.1109\/IPDPS.2016.117"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Alfio Lazzaro et al. 2017. Increasing the efficiency of sparse matrix-matrix multiplication with a 2.5 D algorithm and one-sided MPI. In PASC.  Alfio Lazzaro et al. 2017. Increasing the efficiency of sparse matrix-matrix multiplication with a 2.5 D algorithm and one-sided MPI. In PASC.","DOI":"10.1145\/3093172.3093228"},{"key":"e_1_3_2_1_39_1","unstructured":"Hyuk-Jae Lee etal 1997. Generalized Cannon's Algorithm for Parallel Matrix Multiplication. In ICS.  Hyuk-Jae Lee et al. 1997. Generalized Cannon's Algorithm for Parallel Matrix Multiplication. In ICS."},{"key":"e_1_3_2_1_40_1","unstructured":"Quanquan Liu. 2018. Red-Blue and Standard Pebble Games : Complexity and Applications in the Sequential and Parallel Models.  Quanquan Liu. 2018. Red-Blue and Standard Pebble Games : Complexity and Applications in the Sequential and Parallel Models."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Carl D Meyer. 2000. Matrix analysis and applied linear algebra. SIAM.  Carl D Meyer. 2000. Matrix analysis and applied linear algebra. SIAM.","DOI":"10.1137\/1.9780898719512"},{"key":"e_1_3_2_1_42_1","unstructured":"Sraban Kumar Mohanty. 2010. I\/O Efficient Algorithms for Matrix Computations. CoRR abs\/1006.1307 (2010).  Sraban Kumar Mohanty. 2010. I\/O Efficient Algorithms for Matrix Computations. CoRR abs\/1006.1307 (2010)."},{"key":"e_1_3_2_1_43_1","unstructured":"Andrew Y Ng etal 2002. On spectral clustering: Analysis and an algorithm. In NIPS.  Andrew Y Ng et al. 2002. On spectral clustering: Analysis and an algorithm. In NIPS."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Donald W. Rogers. 2003. Computational Chemistry Using the PC. John Wiley & Sons Inc.  Donald W. Rogers. 2003. Computational Chemistry Using the PC. John Wiley & Sons Inc.","DOI":"10.1002\/0471474908"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Jacob Scott et al. 2015. Matrix multiplication I\/O-complexity by path routing. In SPAA.  Jacob Scott et al. 2015. Matrix multiplication I\/O-complexity by path routing. In SPAA.","DOI":"10.1145\/2755573.2755594"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Ravi Sethi. 1973. Complete Register Allocation Problems. In STOC.  Ravi Sethi. 1973. Complete Register Allocation Problems. In STOC.","DOI":"10.1145\/800125.804049"},{"key":"e_1_3_2_1_47_1","unstructured":"Tyler Michael Smith and Robert A. van de Geijn. 2017. Pushing the Bounds for Matrix-Matrix Multiplication. CoRR abs\/1702.02017 (2017).  Tyler Michael Smith and Robert A. van de Geijn. 2017. Pushing the Bounds for Matrix-Matrix Multiplication. CoRR abs\/1702.02017 (2017)."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807654"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Edgar Solomonik et al. 2013. Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions. In IPDPS.  Edgar Solomonik et al. 2013. Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions. In IPDPS.","DOI":"10.1109\/IPDPS.2013.112"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"crossref","unstructured":"Edgar Solomonik et al. 2016. Trade-Offs Between Synchronization Communication and Computation in Parallel Linear Algebra omputations. TOPC (2016).  Edgar Solomonik et al. 2016. Trade-Offs Between Synchronization Communication and Computation in Parallel Linear Algebra omputations. TOPC (2016).","DOI":"10.1145\/2897188"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"crossref","unstructured":"E. Solomonik et al. 2017. Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication. In SC.  E. Solomonik et al. 2017. Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication. In SC.","DOI":"10.1145\/3126908.3126971"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Edgar Solomonik and James Demmel. 2011. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms. In EuroPar.  Edgar Solomonik and James Demmel. 2011. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms. In EuroPar.","DOI":"10.1007\/978-3-642-23397-5_10"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Volker Strassen. 1969. Gaussian Elimination is Not Optimal. Numer. Math. (1969).  Volker Strassen. 1969. Gaussian Elimination is Not Optimal. Numer. Math. (1969).","DOI":"10.1007\/BF02165411"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Sivan Toledo. 1999. A survey of out-of-core algorithms in numerical linear algebra. External Memory Algorithms and Visualization (1999).  Sivan Toledo. 1999. A survey of out-of-core algorithms in numerical linear algebra. External Memory Algorithms and Visualization (1999).","DOI":"10.1090\/dimacs\/050\/09"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1096-9128(199704)9:4<255::AID-CPE250>3.0.CO;2-2"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/568014.379590"},{"key":"e_1_3_2_1_57_1","unstructured":"Michael E. Wolfand Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. In PLDI.  Michael E. Wolfand Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. In PLDI."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"crossref","unstructured":"M. Wolfe. 1989. More Iteration Space Tiling. In SC.  M. Wolfe. 1989. More Iteration Space Tiling. In SC.","DOI":"10.1145\/76263.76337"},{"key":"e_1_3_2_1_59_1","unstructured":"Nan Xiong. 2018. Optimizing Tall-and-Skinny Matrix-Matrix Multiplication on GPUs. Ph.D. Dissertation. UC Riverside.  Nan Xiong. 2018. Optimizing Tall-and-Skinny Matrix-Matrix Multiplication on GPUs. Ph.D. Dissertation. UC Riverside."},{"key":"e_1_3_2_1_60_1","unstructured":"Qinqing Zheng and John D. Lafferty. 2016. Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent. CoRR (2016).  Qinqing Zheng and John D. Lafferty. 2016. Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent. CoRR (2016)."}],"event":{"name":"SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis","location":"Denver Colorado","acronym":"SC '19","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","IEEE CS"]},"container-title":["Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3295500.3356181","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3295500.3356181","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:02:13Z","timestamp":1750208533000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3295500.3356181"}},"subtitle":["near optimal parallel matrix-matrix multiplication"],"short-title":[],"issued":{"date-parts":[[2019,11,17]]},"references-count":60,"alternative-id":["10.1145\/3295500.3356181","10.1145\/3295500"],"URL":"https:\/\/doi.org\/10.1145\/3295500.3356181","relation":{},"subject":[],"published":{"date-parts":[[2019,11,17]]},"assertion":[{"value":"2019-11-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}