{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:16:47Z","timestamp":1750306607996,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,10,12]],"date-time":"2015-10-12T00:00:00Z","timestamp":1444608000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"eSSENCE"},{"name":"UMIT Research Lab via an EU M\u00e5l 2 project"},{"DOI":"10.13039\/501100004359","name":"Swedish Research Council","doi-asserted-by":"crossref","award":["A0581501"],"award-info":[{"award-number":["A0581501"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"crossref"}]},{"name":"a strategic collaborative e-Science programme"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2015,10,26]]},"abstract":"<jats:p>Library software implementing a parallel small-bulge multishift QR algorithm with Aggressive Early Deflation (AED) targeting distributed memory high-performance computing systems is presented. Starting from recent developments of the parallel multishift QR algorithm [Granat et al., SIAM J. Sci. Comput. 32(4), 2010], we describe a number of algorithmic and implementation improvements. These include communication avoiding algorithms via data redistribution and a refined strategy for balancing between multishift QR sweeps and AED. Guidelines concerning several important tunable algorithmic parameters are provided. As a result of these improvements, a computational bottleneck within AED has been removed in the parallel multishift QR algorithm. A performance model is established to explain the scalability behavior of the new parallel multishift QR algorithm. Numerous computational experiments confirm that our new implementation significantly outperforms previous parallel implementations of the QR algorithm.<\/jats:p>","DOI":"10.1145\/2699471","type":"journal-article","created":{"date-parts":[[2015,10,13]],"date-time":"2015-10-13T22:31:00Z","timestamp":1444775460000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Algorithm 953"],"prefix":"10.1145","volume":"41","author":[{"given":"Robert","family":"Granat","sequence":"first","affiliation":[{"name":"Ume\u00e5 University, Ume\u00e5, Sweden"}]},{"given":"Bo","family":"K\u00e5gstr\u00f6m","sequence":"additional","affiliation":[{"name":"Ume\u00e5 University, Ume\u00e5, Sweden"}]},{"given":"Daniel","family":"Kressner","sequence":"additional","affiliation":[{"name":"EPF Lausanne, Lausanne, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4914-7666","authenticated-orcid":false,"given":"Meiyue","family":"Shao","sequence":"additional","affiliation":[{"name":"Ume\u00e5 University and EPF Lausanne, Lausanne, Switzerland"}]}],"member":"320","published-online":{"date-parts":[[2015,10,12]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/180\/1\/012037"},{"key":"e_1_2_2_2_1","unstructured":"M. Ahues and F. Tisseur. 1997. A New Deflation Criterion for the QR Algorithm. LAPACK Working Note 122.  M. Ahues and F. Tisseur. 1997. A New Deflation Criterion for the QR Algorithm. LAPACK Working Note 122."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.05.002"},{"key":"e_1_2_2_4_1","volume-title":"Technical Report CS-97-355. Department of Computer Science","author":"Bai Z.","year":"1997","unstructured":"Z. Bai , D. Day , J. W. Demmel , and J. J. Dongarra . 1997 a. A Test Matrix Collection for Non-Hermitian Eigenvalue Problems (Release 1.0). Technical Report CS-97-355. Department of Computer Science , University of Tennessee. Also available online from http:\/\/math.nist.gov\/MatrixMarket. Z. Bai, D. Day, J. W. Demmel, and J. J. Dongarra. 1997a. A Test Matrix Collection for Non-Hermitian Eigenvalue Problems (Release 1.0). Technical Report CS-97-355. Department of Computer Science, University of Tennessee. Also available online from http:\/\/math.nist.gov\/MatrixMarket."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129053389000068"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/0024-3795(93)90286-W"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595281368"},{"key":"e_1_2_2_8_1","volume-title":"Technical Report UCB\/EECS-2010-136. EECS Department","author":"Ballard G.","year":"2010","unstructured":"G. Ballard , J. W. Demmel , and I. Dumitriu . 2010 . Minimizing Communication for Eigenproblems and the Singular Value Decomposition. Technical Report UCB\/EECS-2010-136. EECS Department , University of California, Berkeley. Also as LAPACK Working Note 237. G. Ballard, J. W. Demmel, and I. Dumitriu. 2010. Minimizing Communication for Eigenproblems and the Singular Value Decomposition. Technical Report UCB\/EECS-2010-136. EECS Department, University of California, Berkeley. Also as LAPACK Working Note 237."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1137\/090769156"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1055531.1055533"},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"L. S. Blackford J. Choi A. Cleary E. D\u2019Azevedo J. W. Demmel I. Dhillon J. J. Dongarra S. Hammarling G. Henry A. Petitet K. Stanley D. Walker and R. C. Whaley. 1997. ScaLAPACK User\u2019s Guide. Society for Industrial and Applied Mathematics Philadelphia PA.  L. S. Blackford J. Choi A. Cleary E. D\u2019Azevedo J. W. Demmel I. Dhillon J. J. Dongarra S. Hammarling G. Henry A. Petitet K. Stanley D. Walker and R. C. Whaley. 1997. ScaLAPACK User\u2019s Guide. Society for Industrial and Applied Mathematics Philadelphia PA.","DOI":"10.1137\/1.9780898719642"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479801384573"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479801384585"},{"key":"e_1_2_2_14_1","unstructured":"R. Byers. 2007. LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-Shift QR Algorithm with Aggressive Early Deflation. LAPACK Working Note 187.  R. Byers. 2007. LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-Shift QR Algorithm with Aggressive Early Deflation. LAPACK Working Note 187."},{"key":"e_1_2_2_15_1","unstructured":"EPFL. 2013. Bellatrix. URL: http:\/\/hpc.epfl.ch\/clusters\/bellatrix\/.  EPFL. 2013. Bellatrix. URL: http:\/\/hpc.epfl.ch\/clusters\/bellatrix\/."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/838250.838256"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/4.4.332"},{"key":"e_1_2_2_18_1","volume-title":"Matrix Computations","author":"Golub G. H.","unstructured":"G. H. Golub and C. F. Van Loan . 1996. Matrix Computations ( 3 rd ed.). Johns Hopkins University Press , Baltimore, MD . G. H. Golub and C. F. Van Loan. 1996. Matrix Computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD.","edition":"3"},{"key":"e_1_2_2_19_1","unstructured":"A. Grama A. Gupta G. Karypis and V. Kumar. 2003. Introduction to Parallel Computing (2nd ed.). Addison-Wesley Boston MA.  A. Grama A. Gupta G. Karypis and V. Kumar. 2003. Introduction to Parallel Computing (2nd ed.). Addison-Wesley Boston MA."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v21:9"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1137\/090756934"},{"key":"e_1_2_2_22_1","doi-asserted-by":"crossref","unstructured":"R. Granat B. K\u00e5gstr\u00f6m D. Kressner and M. Shao. 2014a. Algorithm 953: Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation\u2014Electronic Appendix: Derivation of the Performance Model.  R. Granat B. K\u00e5gstr\u00f6m D. Kressner and M. Shao. 2014a. Algorithm 953: Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation\u2014Electronic Appendix: Derivation of the Performance Model.","DOI":"10.1145\/2699471"},{"key":"e_1_2_2_23_1","volume-title":"Technical Report UMINF-14.24. Department of Computing Science","author":"Granat R.","year":"2014","unstructured":"R. Granat , B. K\u00e5gstr\u00f6m , D. Kressner , and M. Shao . 2014 b. PDHSEQR User\u2019s Guide . Technical Report UMINF-14.24. Department of Computing Science , Ume\u00e5 University . Available from http:\/\/www8.cs.umu.se\/research\/uminf\/. R. Granat, B. K\u00e5gstr\u00f6m, D. Kressner, and M. Shao. 2014b. PDHSEQR User\u2019s Guide. Technical Report UMINF-14.24. Department of Computing Science, Ume\u00e5 University. Available from http:\/\/www8.cs.umu.se\/research\/uminf\/."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1137\/0917056"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827597325165"},{"key":"e_1_2_2_27_1","unstructured":"HPC2N. 2013a. Abisko. URL: http:\/\/www.hpc2n.umu.se\/resources\/abisko\/.  HPC2N. 2013a. Abisko. URL: http:\/\/www.hpc2n.umu.se\/resources\/abisko\/."},{"key":"e_1_2_2_28_1","unstructured":"HPC2N. 2013b. Akka. URL: http:\/\/www.hpc2n.umu.se\/resources\/akka\/.  HPC2N. 2013b. Akka. URL: http:\/\/www.hpc2n.umu.se\/resources\/akka\/."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2004.03.021"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10543-008-0180-1"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28151-8_1"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.05.001"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02163331"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(94)90112-0"},{"key":"e_1_2_2_36_1","volume-title":"Lecture Notes in Computer Science","volume":"6","author":"Smith B. T.","unstructured":"B. T. Smith , J. M. Boyle , J. J. Dongarra , B. S. Garbow , Y. Ikebe , V. C. Klema , and C. B. Moler . 1976. Matrix Eigensystem Routines\u2014EISPACK Guide (2nd ed.) . Lecture Notes in Computer Science , Vol. 6 . Springer-Verlag, New York. B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler. 1976. Matrix Eigensystem Routines\u2014EISPACK Guide (2nd ed.). Lecture Notes in Computer Science, Vol. 6. Springer-Verlag, New York."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2010.06.001"},{"volume-title":"Storage schemes for parallel eigenvalue algorithms","author":"van de Geijn R. A.","key":"e_1_2_2_38_1","unstructured":"R. A. van de Geijn . 1988. Storage schemes for parallel eigenvalue algorithms . In Numerical Linear Algebra Digital Signal Processing and Parallel Algorithms, Gene H. Golub and Paul Van Dooren (Eds.). Springer-Verlag , 639--648. R. A. van de Geijn. 1988. Storage schemes for parallel eigenvalue algorithms. In Numerical Linear Algebra Digital Signal Processing and Parallel Algorithms, Gene H. Golub and Paul Van Dooren (Eds.). Springer-Verlag, 639--648."},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1137\/0614016"},{"volume-title":"4th Conference on Hypercube Concurrent Computers and Applications. 697--700","author":"van de Geijn R. A.","key":"e_1_2_2_40_1","unstructured":"R. A. van de Geijn and D. G. Hudson . 1989. An efficient parallel implementation of the nonsymmetric QR algorithm . In 4th Conference on Hypercube Concurrent Computers and Applications. 697--700 . R. A. van de Geijn and D. G. Hudson. 1989. An efficient parallel implementation of the nonsymmetric QR algorithm. In 4th Conference on Hypercube Concurrent Computers and Applications. 697--700."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1137\/0915057"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2699471","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2699471","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:16:58Z","timestamp":1750227418000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2699471"}},"subtitle":["Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation"],"short-title":[],"issued":{"date-parts":[[2015,10,12]]},"references-count":39,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,10,26]]}},"alternative-id":["10.1145\/2699471"],"URL":"https:\/\/doi.org\/10.1145\/2699471","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"type":"print","value":"0098-3500"},{"type":"electronic","value":"1557-7295"}],"subject":[],"published":{"date-parts":[[2015,10,12]]},"assertion":[{"value":"2013-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}