{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T03:05:46Z","timestamp":1769915146905,"version":"3.49.0"},"reference-count":216,"publisher":"Cambridge University Press (CUP)","license":[{"start":{"date-parts":[[2014,5,12]],"date-time":"2014-05-12T00:00:00Z","timestamp":1399852800000},"content-version":"unspecified","delay-in-days":11,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Acta Numerica"],"published-print":{"date-parts":[[2014,5]]},"abstract":"<jats:p>The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, <jats:italic>communication<\/jats:italic>, or moving data, is the bottleneck. This motivates us to seek algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. In this paper we summarize recent progress in three aspects of this problem. First we describe lower bounds on communication. Some of these generalize known lower bounds for dense classical <jats:italic>(O(n<\/jats:italic><jats:sup>3<\/jats:sup>)) matrix multiplication to all direct methods of linear algebra, to sequential and parallel algorithms, and to dense and sparse matrices. We also present lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices. Second, we compare these lower bounds to widely used versions of these algorithms, and note that these widely used algorithms usually communicate asymptotically more than is necessary. Third, we identify or invent new algorithms for most linear algebra problems that do attain these lower bounds, and demonstrate large speed-ups in theory and practice.<\/jats:p>","DOI":"10.1017\/s0962492914000038","type":"journal-article","created":{"date-parts":[[2014,5,12]],"date-time":"2014-05-12T09:00:04Z","timestamp":1399885204000},"page":"1-155","source":"Crossref","is-referenced-by-count":80,"title":["Communication lower bounds and optimal algorithms for numerical linear algebra"],"prefix":"10.1017","volume":"23","author":[{"given":"G.","family":"Ballard","sequence":"first","affiliation":[]},{"given":"E.","family":"Carson","sequence":"additional","affiliation":[]},{"given":"J.","family":"Demmel","sequence":"additional","affiliation":[]},{"given":"M.","family":"Hoemmen","sequence":"additional","affiliation":[]},{"given":"N.","family":"Knight","sequence":"additional","affiliation":[]},{"given":"O.","family":"Schwartz","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2014,5,12]]},"reference":[{"key":"S0962492914000038_ref101","first-page":"1","volume-title":"Proc. International Conference for High Performance Computing, Networking, Storage and Analysis: SC '12","author":"Georganas","year":"2012"},{"key":"S0962492914000038_ref110","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479895284944"},{"key":"S0962492914000038_ref216","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.08.004"},{"key":"S0962492914000038_ref014","first-page":"84","volume-title":"Proc. Fifth SIAM Conference on Parallel Processing for Scientific Computing","author":"Bai","year":"1991"},{"key":"S0962492914000038_ref160","doi-asserted-by":"publisher","DOI":"10.1007\/s00200-007-0038-9"},{"key":"S0962492914000038_ref098","first-page":"285","volume-title":"Proc. 40th Annual Symposium on Foundations of Computer Science: FOCS '99","author":"Frigo","year":"1999"},{"key":"S0962492914000038_ref038","doi-asserted-by":"publisher","DOI":"10.1007\/s002240000131"},{"key":"S0962492914000038_ref097","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-007-9098-2"},{"key":"S0962492914000038_ref105","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2005.08.009"},{"key":"S0962492914000038_ref152","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9904-1949-09320-5"},{"key":"S0962492914000038_ref020","volume-title":"Communication-optimal parallel and sequential eigenvalue and singular value algorithms","author":"Ballard"},{"key":"S0962492914000038_ref202","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827599353865"},{"key":"S0962492914000038_ref061","doi-asserted-by":"publisher","DOI":"10.21236\/ADA584726"},{"key":"S0962492914000038_ref168","doi-asserted-by":"publisher","DOI":"10.1007\/BF02017352"},{"key":"S0962492914000038_ref192","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(95)00057-7"},{"key":"S0962492914000038_ref069","first-page":"4","volume-title":"Proc. 2008 ACM\/IEEE Conference on Supercomputing","author":"Datta","year":"2008"},{"key":"S0962492914000038_ref159","volume-title":"Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD","author":"Nakatsukasa","year":"2012"},{"key":"S0962492914000038_ref209","volume-title":"Proc. 2014 IEEE International Parallel and Distributed Processing Symposium: IPDPS '14","author":"Williams","year":"2014"},{"key":"S0962492914000038_ref046","unstructured":"B\u00f6rm S. and Grasedyck L. (2006), HLib package. http:\/\/www.hlib.org\/hlib.html"},{"key":"S0962492914000038_ref015","doi-asserted-by":"publisher","DOI":"10.1093\/imanum\/14.4.563"},{"key":"S0962492914000038_ref141","first-page":"15","volume-title":"Roczniki Polskiego Towarzystwa Matematycznego, Seria III: Matematyka Stosowana II","author":"Kielbasinski","year":"1974"},{"key":"S0962492914000038_ref129","doi-asserted-by":"publisher","DOI":"10.1137\/0710033"},{"key":"S0962492914000038_ref037","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(89)90091-4"},{"key":"S0962492914000038_ref107","first-page":"171","volume-title":"High-Speed Computing: Scientific Applications and Algorithm Design","author":"Golub","year":"1988"},{"key":"S0962492914000038_ref215","first-page":"93","volume-title":"Proc. 19th Annual ACM Symposium on Parallel Algorithms and Architectures: SPAA '07","author":"Yotov","year":"2007"},{"key":"S0962492914000038_ref211","doi-asserted-by":"publisher","DOI":"10.1145\/2213977.2214056"},{"key":"S0962492914000038_ref049","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479801384585"},{"key":"S0962492914000038_ref210","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.12.006"},{"key":"S0962492914000038_ref207","doi-asserted-by":"publisher","DOI":"10.1137\/0909010"},{"key":"S0962492914000038_ref212","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44520-X_108"},{"key":"S0962492914000038_ref075","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-007-0061-6"},{"key":"S0962492914000038_ref205","volume-title":"Proc. of SciDAC 2005, J. of Physics Conference Series","author":"Vuduc","year":"2005"},{"key":"S0962492914000038_ref204","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48311-X_158"},{"key":"S0962492914000038_ref198","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479896297744"},{"key":"S0962492914000038_ref200","doi-asserted-by":"publisher","DOI":"10.1137\/0611023"},{"key":"S0962492914000038_ref093","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0095328"},{"key":"S0962492914000038_ref088","doi-asserted-by":"publisher","DOI":"10.1145\/42288.42291"},{"key":"S0962492914000038_ref203","volume-title":"Minimizing inner product data dependencies in conjugate gradient iteration","author":"Rosendale","year":"1983"},{"key":"S0962492914000038_ref027","doi-asserted-by":"publisher","DOI":"10.1137\/090769156"},{"key":"S0962492914000038_ref153","doi-asserted-by":"publisher","DOI":"10.1007\/PL00008264"},{"key":"S0962492914000038_ref213","volume-title":"PARA08, Trondheim, Norway","author":"Wolf","year":"2008"},{"key":"S0962492914000038_ref199","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-99-01171-0"},{"key":"S0962492914000038_ref191","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45545-0_23"},{"key":"S0962492914000038_ref081","volume-title":"Avoiding communication in computing Krylov subspaces","author":"Demmel","year":"2007"},{"key":"S0962492914000038_ref080","doi-asserted-by":"publisher","DOI":"10.1137\/080731992"},{"key":"S0962492914000038_ref086","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"S0962492914000038_ref079","unstructured":"Demmel J. , Grigori L. , Hoemmen M. and Langou J. (2008 a), Communication-avoiding parallel and sequential QR and LU factorizations: Theory and practice. LAPACK Working Note."},{"key":"S0962492914000038_ref001","doi-asserted-by":"publisher","DOI":"10.1007\/BF01931804"},{"key":"S0962492914000038_ref146","doi-asserted-by":"crossref","DOI":"10.6028\/jres.045.026","volume-title":"An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators","author":"Lanczos","year":"1950"},{"key":"S0962492914000038_ref026","first-page":"1","volume-title":"Proc. 23rd ACM Symposium on Parallelism in Algorithms and Architectures: SPAA '11","author":"Ballard","year":"2011"},{"key":"S0962492914000038_ref083","doi-asserted-by":"publisher","DOI":"10.1137\/070688778"},{"key":"S0962492914000038_ref067","volume-title":"Frontiers in Massive Data Analysis","year":"2013"},{"key":"S0962492914000038_ref174","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0030842"},{"key":"S0962492914000038_ref077","first-page":"649","volume-title":"Proc. 27th IEEE International Parallel and Distributed Processing Symposium: IPDPS '13","author":"Demmel","year":"2013"},{"key":"S0962492914000038_ref194","doi-asserted-by":"publisher","DOI":"10.1177\/1094342005051521"},{"key":"S0962492914000038_ref189","doi-asserted-by":"publisher","DOI":"10.1137\/070682563"},{"key":"S0962492914000038_ref028","doi-asserted-by":"publisher","DOI":"10.1145\/2395116.2395121"},{"key":"S0962492914000038_ref169","doi-asserted-by":"publisher","DOI":"10.1145\/1916461.1916462"},{"key":"S0962492914000038_ref106","volume-title":"Matrix Computations","author":"Golub","year":"1996"},{"key":"S0962492914000038_ref171","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"S0962492914000038_ref055","doi-asserted-by":"publisher","DOI":"10.1137\/120881191"},{"key":"S0962492914000038_ref112","doi-asserted-by":"publisher","DOI":"10.1007\/BF02510248"},{"key":"S0962492914000038_ref161","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2004.44"},{"key":"S0962492914000038_ref092","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03869-3_61"},{"key":"S0962492914000038_ref053","unstructured":"Cannon L. (1969), A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University."},{"key":"S0962492914000038_ref057","volume-title":"Proc. 15th IEEE International Parallel and Distributed Processing Symposium: IPDPS '01","author":"Catalyiirek","year":"2001"},{"key":"S0962492914000038_ref050","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1977-0428694-0"},{"key":"S0962492914000038_ref162","first-page":"459","volume-title":"Acta Numerica","volume":"4","author":"Parlett","year":"1995"},{"key":"S0962492914000038_ref025","doi-asserted-by":"publisher","DOI":"10.1137\/090760969"},{"key":"S0962492914000038_ref181","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-006-0042-1"},{"key":"S0962492914000038_ref090","first-page":"21","article-title":"Cache optimization for structured and unstructured grid multigrid","volume":"10","author":"Douglas","year":"2000","journal-title":"Electron. Trans. Numer. Anal."},{"key":"S0962492914000038_ref005","first-page":"1116","article-title":"The input\/output complexity of sorting and related problems","volume":"31","author":"Aggarwal","year":"1988","journal-title":"Comm. Assoc. Comput. Mach."},{"key":"S0962492914000038_ref124","unstructured":"Hackbusch W. (2006), Hierarchische Matrizen: Algorithmen und Analysis. http:\/\/www.mis.mpg.de\/scicomp\/Fulltext\/hmvorlesung.ps"},{"key":"S0962492914000038_ref058","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1206"},{"key":"S0962492914000038_ref052","volume-title":"Autotuning sparse matrix-vector multiplication for multicore","author":"Byun","year":"2012"},{"key":"S0962492914000038_ref149","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(83)90105-6"},{"key":"S0962492914000038_ref172","doi-asserted-by":"publisher","DOI":"10.1137\/0907058"},{"key":"S0962492914000038_ref004","doi-asserted-by":"publisher","DOI":"10.1147\/rd.386.0673"},{"key":"S0962492914000038_ref145","volume-title":"Enabling code generation within the sparse polyhedral framework","author":"LaMielle","year":"2010"},{"key":"S0962492914000038_ref008","volume-title":"LAPACK Users' Guide","author":"Anderson","year":"1992"},{"key":"S0962492914000038_ref060","doi-asserted-by":"publisher","DOI":"10.1177\/109434200101500106"},{"key":"S0962492914000038_ref016","unstructured":"Ballard G. (2013), Avoiding communication in dense linear algebra. PhD thesis, EECS Department, UC Berkeley."},{"key":"S0962492914000038_ref108","volume-title":"Getting up to Speed: The Future of Supercomputing","author":"Graham","year":"2004"},{"key":"S0962492914000038_ref054","doi-asserted-by":"publisher","DOI":"10.1137\/120893057"},{"key":"S0962492914000038_ref056","doi-asserted-by":"publisher","DOI":"10.1109\/71.780863"},{"key":"S0962492914000038_ref045","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567807"},{"key":"S0962492914000038_ref012","volume-title":"A test matrix collection for non-Hermitian eigenvalue problems","author":"Bai","year":"1997"},{"key":"S0962492914000038_ref009","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.15"},{"key":"S0962492914000038_ref144","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X"},{"key":"S0962492914000038_ref175","volume-title":"Scalable universal matrix multiplication algorithms: 2D and 3D variations on a theme","author":"Schatz","year":"2013"},{"key":"S0962492914000038_ref065","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.1994.365694"},{"key":"S0962492914000038_ref102","doi-asserted-by":"publisher","DOI":"10.1137\/0710032"},{"key":"S0962492914000038_ref059","doi-asserted-by":"publisher","DOI":"10.1137\/S106482759833913X"},{"key":"S0962492914000038_ref132","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40104-6_36"},{"key":"S0962492914000038_ref006","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(90)90188-N"},{"key":"S0962492914000038_ref011","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719581"},{"key":"S0962492914000038_ref123","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479897331862"},{"key":"S0962492914000038_ref096","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088197"},{"key":"S0962492914000038_ref100","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676393"},{"key":"S0962492914000038_ref048","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479801384573"},{"key":"S0962492914000038_ref010","doi-asserted-by":"publisher","DOI":"10.1090\/qam\/42792"},{"key":"S0962492914000038_ref021","first-page":"257","volume-title":"Proc. 23rd ACM Symposium on Parallelism in Algorithms and Architectures: SPAA '11","author":"Ballard","year":"2011"},{"key":"S0962492914000038_ref076","first-page":"261","volume-title":"Proc. 27th IEEE International Parallel and Distributed Processing Symposium: IPDPS '13","author":"Demmel","year":"2013"},{"key":"S0962492914000038_ref073","volume-title":"An arithmetic complexity lower bound for computing rational functions, with applications to linear algebra","author":"Demmel","year":"2013"},{"key":"S0962492914000038_ref116","first-page":"79","volume-title":"Proc. 22nd ACM Symposium on Parallelism in Algorithms and Architectures: SPAA '10","author":"Grigori","year":"2010"},{"key":"S0962492914000038_ref091","first-page":"1075","volume-title":"Proc. 27th IEEE International Parallel and Distributed Processing Symposium: IPDPS '13","author":"Driscoll","year":"2013"},{"key":"S0962492914000038_ref040","doi-asserted-by":"publisher","DOI":"10.1137\/0908009"},{"key":"S0962492914000038_ref034","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611971538"},{"key":"S0962492914000038_ref114","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12200-2_14"},{"key":"S0962492914000038_ref023","volume-title":"Avoiding communication in successive band reduction","author":"Ballard","year":"2013"},{"key":"S0962492914000038_ref208","doi-asserted-by":"publisher","DOI":"10.1145\/321105.321107"},{"key":"S0962492914000038_ref182","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23397-5_10"},{"key":"S0962492914000038_ref018","first-page":"895","volume-title":"Proc. 27th IEEE International Parallel Distributed Processing Symposium: IPDPS '13","author":"Ballard","year":"2013"},{"key":"S0962492914000038_ref119","doi-asserted-by":"publisher","DOI":"10.1145\/1055531.1055534"},{"key":"S0962492914000038_ref031","first-page":"193","volume-title":"Proc. 24th ACM Symposium on Parallelism in Algorithms and Architectures: SPAA '12","author":"Ballard","year":"2012"},{"key":"S0962492914000038_ref117","doi-asserted-by":"publisher","DOI":"10.1137\/100788926"},{"key":"S0962492914000038_ref113","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15155-2_35"},{"key":"S0962492914000038_ref032","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-34862-4_2"},{"key":"S0962492914000038_ref041","doi-asserted-by":"publisher","DOI":"10.1145\/365723.365736"},{"key":"S0962492914000038_ref072","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611971446"},{"key":"S0962492914000038_ref024","volume-title":"Proc. 2014 IEEE International Parallel and Distributed Processing Symposium: IPDPS '14","author":"Ballard","year":"2014"},{"key":"S0962492914000038_ref156","first-page":"1","volume-title":"2012 19th International Conference on High Performance Computing: HiPC","author":"Mohanty","year":"2012"},{"key":"S0962492914000038_ref030","first-page":"77","volume-title":"Proc. 24th ACM Symposium on Parallelism in Algorithms and Architectures: SPAA '12","author":"Ballard","year":"2012"},{"key":"S0962492914000038_ref044","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719642"},{"key":"S0962492914000038_ref197","unstructured":"Toledo S. (1995), Quantitative performance modeling of scientific computations and creating locality in numerical algorithms. PhD thesis, MIT."},{"key":"S0962492914000038_ref185","first-page":"813","volume-title":"Proc. 27th IEEE International Parallel and Distributed Processing Symposium: IPDPS '13","author":"Solomonik","year":"2013"},{"key":"S0962492914000038_ref039","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-40064-8_6"},{"key":"S0962492914000038_ref151","volume-title":"Proc. International Conference on High Performance Computing, Networking, Storage and Analysis: SC '12","author":"Lipshitz","year":"2012"},{"key":"S0962492914000038_ref019","first-page":"222","volume-title":"Proc. 25th ACM Symposium on Parallelism in Algorithms and Architectures: SPAA '13","author":"Ballard","year":"2013"},{"key":"S0962492914000038_ref179","doi-asserted-by":"publisher","DOI":"10.1007\/BF02309342"},{"key":"S0962492914000038_ref183","first-page":"548","volume-title":"Proc. 27th IEEE International Parallel Distributed Processing Symposium: IPDPS '13","author":"Solomonik","year":"2013"},{"key":"S0962492914000038_ref099","volume-title":"The Future of Computing Performance: Game Over or Next Level?","author":"Fuller","year":"2011"},{"key":"S0962492914000038_ref074","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-007-0114-x"},{"key":"S0962492914000038_ref206","unstructured":"Vuduc R. W. (2003), Automatic performance tuning of sparse matrix kernels. PhD thesis, EECS Department, UC Berkeley."},{"key":"S0962492914000038_ref062","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(89)90062-8"},{"key":"S0962492914000038_ref035","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-010-9285-4"},{"key":"S0962492914000038_ref104","doi-asserted-by":"publisher","DOI":"10.1137\/S106482750340783X"},{"key":"S0962492914000038_ref131","doi-asserted-by":"publisher","DOI":"10.1145\/1356052.1356055"},{"key":"S0962492914000038_ref089","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611971811"},{"key":"S0962492914000038_ref109","volume-title":"Parallel library software for the multishift QR algorithm with aggressive early deflation","author":"Granat","year":"2012"},{"key":"S0962492914000038_ref022","first-page":"35","volume-title":"Proc. 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming: PPoPP '12","author":"Ballard","year":"2012"},{"key":"S0962492914000038_ref111","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611970937"},{"key":"S0962492914000038_ref115","volume-title":"Communication avoiding ILU(0) preconditioned Research report RR-8266","author":"Grigori","year":"2013"},{"key":"S0962492914000038_ref084","volume-title":"Proc. 20th IEEE International Parallel and Distributed Processing Symposium: IPDPS 2006","author":"Devine","year":"2006"},{"key":"S0962492914000038_ref118","doi-asserted-by":"publisher","DOI":"10.1137\/0917055"},{"key":"S0962492914000038_ref121","first-page":"271","volume-title":"Acta Numerica","volume":"6","author":"Gutknecht","year":"1997"},{"key":"S0962492914000038_ref125","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503292"},{"key":"S0962492914000038_ref071","doi-asserted-by":"publisher","DOI":"10.1137\/0210049"},{"key":"S0962492914000038_ref128","unstructured":"Hoemmen M. (2010), Communication-avoiding Krylov subspace methods. PhD thesis, EECS Department, UC Berkeley."},{"key":"S0962492914000038_ref130","first-page":"326","volume-title":"Proc. 13th Annual ACM Symposium on Theory of Computing: STOC '81","author":"Hong","year":"1981"},{"key":"S0962492914000038_ref136","volume-title":"Parallel Comput","author":"Johnsson","year":"1992"},{"key":"S0962492914000038_ref134","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2004.03.021"},{"key":"S0962492914000038_ref003","doi-asserted-by":"publisher","DOI":"10.1147\/rd.395.0575"},{"key":"S0962492914000038_ref137","doi-asserted-by":"publisher","DOI":"10.1080\/00207169208804107"},{"key":"S0962492914000038_ref029","volume-title":"Sequential communication bounds for fast linear algebra","author":"Ballard","year":"2012"},{"key":"S0962492914000038_ref138","first-page":"1","volume-title":"Applied Parallel and Scientific Computing","author":"K\u00e5gstr\u00f6m","year":"2012"},{"key":"S0962492914000038_ref139","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.05.001"},{"key":"S0962492914000038_ref143","volume-title":"Proc. PPAM '13","volume":"8384","author":"Knight","year":"2014"},{"key":"S0962492914000038_ref007","first-page":"368","volume-title":"Proc. 6th International Euro-Par Conference on Parallel Processing","author":"Ahmed","year":"2000"},{"key":"S0962492914000038_ref187","doi-asserted-by":"publisher","DOI":"10.1137\/0613025"},{"key":"S0962492914000038_ref122","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479896306744"},{"key":"S0962492914000038_ref186","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1985.1676570"},{"key":"S0962492914000038_ref148","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1473"},{"key":"S0962492914000038_ref150","unstructured":"Lipshitz B. (2013), Communication-avoiding parallel recursive algorithms for matrix multiplication. Master's thesis, EECS Department, UC Berkeley."},{"key":"S0962492914000038_ref201","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1096-9128(199704)9:4<255::AID-CPE250>3.0.CO;2-2"},{"key":"S0962492914000038_ref155","first-page":"471","volume-title":"Acta Numerica","volume":"15","author":"Meurant","year":"2006"},{"key":"S0962492914000038_ref158","volume-title":"Proc. International Conference on High Performance Computing Networking, Storage and Analysis: SC '09","author":"Mohiyuddin","year":"2009"},{"key":"S0962492914000038_ref013","doi-asserted-by":"publisher","DOI":"10.1007\/s002110050264"},{"key":"S0962492914000038_ref163","first-page":"386","article-title":"On the solution of a system of linear equations whose matrix is symmetric but not definite","volume":"10","author":"Parlett","year":"1970","journal-title":"BIT Numer. Math."},{"key":"S0962492914000038_ref164","first-page":"105","article-title":"A look-ahead Lanczos algorithm for unsymmetric matrices","volume":"44","author":"Parlett","year":"1985","journal-title":"Math. Comp."},{"key":"S0962492914000038_ref064","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(96)00022-1"},{"key":"S0962492914000038_ref068","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45706-2_94"},{"key":"S0962492914000038_ref127","volume-title":"Note on a Householder implementation of the GMRES method","author":"Hindmarsh","year":"1986"},{"key":"S0962492914000038_ref193","first-page":"117","volume-title":"Proc. 23rd ACM Symposium on Parallelism in Algorithms and Architectures","author":"Tang","year":"2011"},{"key":"S0962492914000038_ref170","doi-asserted-by":"publisher","DOI":"10.1137\/0906059"},{"key":"S0962492914000038_ref173","doi-asserted-by":"publisher","DOI":"10.1137\/S1064829598339761"},{"key":"S0962492914000038_ref176","doi-asserted-by":"publisher","DOI":"10.1137\/0910005"},{"key":"S0962492914000038_ref167","doi-asserted-by":"publisher","DOI":"10.1137\/0913042"},{"key":"S0962492914000038_ref051","unstructured":"Buttari A. , Langou J. , Kurzak J. and Dongarra J. J. (2007), A class of parallel tiled linear algebra algorithms for multicore architectures. LAPACK Working Note 191."},{"key":"S0962492914000038_ref133","doi-asserted-by":"publisher","DOI":"10.1145\/73560.73588"},{"key":"S0962492914000038_ref177","first-page":"272","article-title":"\u00fcber einen Grenz\u00fcbergang durch alternierendes Verfahren","volume":"15","author":"Schwarz","year":"1870","journal-title":"Vierteljahrsschrift der Naturforschenden Gesellschaft in Z\u00fcrich"},{"key":"S0962492914000038_ref214","doi-asserted-by":"publisher","DOI":"10.1002\/nla.691"},{"key":"S0962492914000038_ref180","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-07546-1"},{"key":"S0962492914000038_ref120","doi-asserted-by":"publisher","DOI":"10.1147\/rd.416.0737"},{"key":"S0962492914000038_ref126","doi-asserted-by":"publisher","DOI":"10.6028\/jres.049.044"},{"key":"S0962492914000038_ref078","doi-asserted-by":"publisher","DOI":"10.21236\/ADA584728"},{"key":"S0962492914000038_ref190","doi-asserted-by":"publisher","DOI":"10.1007\/BF02165411"},{"key":"S0962492914000038_ref047","unstructured":"B\u00f6rm S. , Grasedyck L. and Hackbusch W. (2004), Hierarchical matrices. http:\/\/www.mis.mpg.de\/scicomp\/Fulltext\/WS_HMatrices.pdf"},{"key":"S0962492914000038_ref184","doi-asserted-by":"publisher","DOI":"10.21236\/ADA604834"},{"key":"S0962492914000038_ref195","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013588221172"},{"key":"S0962492914000038_ref063","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(89)90045-9"},{"key":"S0962492914000038_ref154","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718140"},{"key":"S0962492914000038_ref135","doi-asserted-by":"publisher","DOI":"10.1137\/0912056"},{"key":"S0962492914000038_ref196","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2006.04.017"},{"key":"S0962492914000038_ref002","doi-asserted-by":"publisher","DOI":"10.1007\/BF01939404"},{"key":"S0962492914000038_ref157","unstructured":"Mohiyuddin M. (2012), Tuning hardware and software for multiprocessors. PhD thesis, EECS Department, UC Berkeley."},{"key":"S0962492914000038_ref033","first-page":"232","volume-title":"Proc. 25th ACM Symposium on Parallelism in Algorithms and Architectures: SPAA '13","author":"Ballard","year":"2013"},{"key":"S0962492914000038_ref147","doi-asserted-by":"publisher","DOI":"10.1145\/355841.355847"},{"key":"S0962492914000038_ref178","first-page":"627","volume-title":"31st International Symposium on Theoretical Aspects of Computer Science: STACS 2014","volume":"25","author":"Scquizzato","year":"2014"},{"key":"S0962492914000038_ref070","first-page":"1","article-title":"The University of Florida sparse matrix collection","volume":"38","author":"Davis","year":"2011","journal-title":"ACM Trans. Math. Softw."},{"key":"S0962492914000038_ref142","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(92)90085-C"},{"key":"S0962492914000038_ref066","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1534"},{"key":"S0962492914000038_ref165","volume-title":"Data flow and storage allocation for the PDQ-5 program on the Philco-2000","author":"Pfeifer","year":"1963"},{"key":"S0962492914000038_ref085","doi-asserted-by":"publisher","DOI":"10.1145\/77626.77627"},{"key":"S0962492914000038_ref094","first-page":"345","article-title":"Algorithm 97: Shortest path","volume":"5","author":"Floyd","year":"1962","journal-title":"Commun. Assoc. Comput. Mach."},{"key":"S0962492914000038_ref017","volume-title":"Communication-avoiding symmetric-indefinite factorization. Technical report UCB\/EECS-2013-127","author":"Ballard","year":"2013"},{"key":"S0962492914000038_ref042","doi-asserted-by":"publisher","DOI":"10.1145\/365723.365735"},{"key":"S0962492914000038_ref166","doi-asserted-by":"publisher","DOI":"10.1016\/j.apnum.2010.12.009"},{"key":"S0962492914000038_ref095","doi-asserted-by":"publisher","DOI":"10.1145\/966049.781525"},{"key":"S0962492914000038_ref043","doi-asserted-by":"publisher","DOI":"10.1007\/BF01934122"},{"key":"S0962492914000038_ref140","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719918"},{"key":"S0962492914000038_ref036","doi-asserted-by":"publisher","DOI":"10.4310\/MRL.2010.v17.n4.a6"},{"key":"S0962492914000038_ref082","first-page":"1","volume-title":"Proc. 2008 IEEE International Parallel and Distributed Processing Symposium: IPDPS 2008","author":"Demmel","year":"2008"},{"key":"S0962492914000038_ref103","first-page":"377","article-title":"The analysis of a nested dissection algorithm","author":"Gilbert","year":"1987","journal-title":"Numer. Math."},{"key":"S0962492914000038_ref087","doi-asserted-by":"publisher","DOI":"10.1145\/42288.42292"},{"key":"S0962492914000038_ref188","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827500370883"}],"container-title":["Acta Numerica"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0962492914000038","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,6,6]],"date-time":"2019-06-06T13:21:18Z","timestamp":1559827278000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0962492914000038\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,5]]},"references-count":216,"alternative-id":["S0962492914000038"],"URL":"https:\/\/doi.org\/10.1017\/s0962492914000038","relation":{},"ISSN":["0962-4929","1474-0508"],"issn-type":[{"value":"0962-4929","type":"print"},{"value":"1474-0508","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,5]]}}}