{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,31]],"date-time":"2025-03-31T04:04:58Z","timestamp":1743393898304,"version":"3.40.3"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T00:00:00Z","timestamp":1743033600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T00:00:00Z","timestamp":1743033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["SiVeGCS"],"award-info":[{"award-number":["SiVeGCS"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Forschungszentrum J\u00fclich GmbH"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>We present , a GPU-accelerated iterative linear solver based on the transpose-free quasi-minimal residual (tfQMR) method. Designed for large-scale electronic structure calculations, particularly in the context of Korringa\u2013Kohn\u2013Rostoker density functional theory,  efficiently handles block-sparse complex matrices arising from multiple scattering theory. The solver exploits GPU parallelism to accelerate convergence while leveraging memory-efficient sparse storage formats. By unifying the solution of multiple right-hand side (RHS) block vectors,  significantly improves throughput, demonstrating up to a <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$3.5\\times$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>3.5<\/mml:mn>\n                    <mml:mo>\u00d7<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> speedup on modern GPUs. Additionally, we introduce a flexible implementation framework that supports both explicit matrix-based and matrix-free operator formulations, such as high-order finite-difference stencils for real-space grid-based Green function calculations. Benchmarks on various NVIDIA GPUs demonstrate the solver\u2019s efficiency, in some cases achieving over 56% of peak floating-point performance for block-sparse matrix multiplications.  is open-source, providing interfaces for C, C++, Fortran, Julia, and Python, making it a versatile tool for high-performance computing applications that can benefit from the unification of RHS problems.<\/jats:p>","DOI":"10.1007\/s11227-025-07145-6","type":"journal-article","created":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T03:55:44Z","timestamp":1743306944000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["tfQMRgpu: a GPU-accelerated linear solver with block-sparse complex result matrix"],"prefix":"10.1007","volume":"81","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2005-4474","authenticated-orcid":false,"given":"Paul F.","family":"Baumeister","sequence":"first","affiliation":[]},{"given":"Stepan","family":"Nassyr","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,27]]},"reference":[{"key":"7145_CR1","doi-asserted-by":"publisher","first-page":"864","DOI":"10.1103\/PhysRev.136.B864","volume":"136","author":"P Hohenberg","year":"1964","unstructured":"Hohenberg P, Kohn W (1964) Inhomogeneous electron gas. Phys Rev 136:864\u2013871. https:\/\/doi.org\/10.1103\/PhysRev.136.B864","journal-title":"Phys Rev"},{"key":"7145_CR2","doi-asserted-by":"publisher","first-page":"1133","DOI":"10.1103\/PhysRev.140.A1133","volume":"140","author":"W Kohn","year":"1965","unstructured":"Kohn W, Sham LJ (1965) Self-consistent equations including exchange and correlation effects. Phys Rev 140:1133\u20131138. https:\/\/doi.org\/10.1103\/PhysRev.140.A1133","journal-title":"Phys Rev"},{"key":"7145_CR3","doi-asserted-by":"publisher","first-page":"31360","DOI":"10.1039\/C5CP00437C","volume":"17","author":"S Mohr","year":"2015","unstructured":"Mohr S, Ratcliff LE, Genovese L, Caliste D, Boulanger P, Goedecker S, Deutsch T (2015) Accurate and efficient linear scaling dft calculations with universal applicability. Phys Chem Chem Phys 17:31360\u201331370. https:\/\/doi.org\/10.1039\/C5CP00437C","journal-title":"Phys Chem Chem Phys"},{"issue":"16","key":"7145_CR4","doi-asserted-by":"publisher","DOI":"10.1063\/5.0005074","volume":"152","author":"A Nakata","year":"2020","unstructured":"Nakata A, Baker JS, Mujahed SY, Poulton JTL, Arapan S, Lin J, Raza Z, Yadav S, Truflandier L, Miyazaki T, Bowler DR (2020) Large scale and linear scaling dft with the conquest code. J Chem Phys 152(16):164112. https:\/\/doi.org\/10.1063\/5.0005074","journal-title":"J Chem Phys"},{"issue":"10","key":"7145_CR5","doi-asserted-by":"publisher","first-page":"3565","DOI":"10.1021\/ct200897x","volume":"8","author":"J VandeVondele","year":"2012","unstructured":"VandeVondele J, Bor\u0161tnik U, Hutter J (2012) Linear scaling self-consistent field calculations with millions of atoms in the condensed phase. J Chem Theory Comput 8(10):3565\u20133573. https:\/\/doi.org\/10.1021\/ct200897x","journal-title":"J Chem Theory Comput"},{"issue":"6","key":"7145_CR6","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1016\/0031-8914(47)90013-X","volume":"13","author":"J Korringa","year":"1947","unstructured":"Korringa J (1947) On the calculation of the energy of a Bloch wave in a metal. Physica 13(6):392\u2013400. https:\/\/doi.org\/10.1016\/0031-8914(47)90013-X","journal-title":"Physica"},{"key":"7145_CR7","doi-asserted-by":"publisher","first-page":"1111","DOI":"10.1103\/PhysRev.94.1111","volume":"94","author":"W Kohn","year":"1954","unstructured":"Kohn W, Rostoker N (1954) Solution of the Schr\u00f6dinger equation in periodic lattices with an application to metallic lithium. Phys Rev 94:1111\u20131120. https:\/\/doi.org\/10.1103\/PhysRev.94.1111","journal-title":"Phys Rev"},{"key":"7145_CR8","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.85.235103","volume":"85","author":"A Thiess","year":"2012","unstructured":"Thiess A, Zeller R, Bolten M, Dederichs PH, Bl\u00fcgel S (2012) Massively parallel density functional calculations for thousands of atoms: KKRnano. Phys Rev B 85:235103. https:\/\/doi.org\/10.1103\/PhysRevB.85.235103","journal-title":"Phys Rev B"},{"key":"7145_CR9","doi-asserted-by":"publisher","first-page":"8807","DOI":"10.1103\/PhysRevB.52.8807","volume":"52","author":"R Zeller","year":"1995","unstructured":"Zeller R, Dederichs PH, \u00dajfalussy B, Szunyogh L, Weinberger P (1995) Theory and convergence properties of the screened Korringa-Kohn-Rostoker method. Phys Rev B 52:8807\u20138812. https:\/\/doi.org\/10.1103\/PhysRevB.52.8807","journal-title":"Phys Rev B"},{"key":"7145_CR10","unstructured":"JuKKR Repository. Forschungszentrum J\u00fclich GmbH. https:\/\/iffgit.fz-juelich.de\/kkr\/jukkr Accessed 09 Sept 2023"},{"issue":"48","key":"7145_CR11","doi-asserted-by":"publisher","DOI":"10.1088\/1361-648X\/ab38a0","volume":"31","author":"M Bornemann","year":"2019","unstructured":"Bornemann M, Grytsiuk S, Baumeister PF, Santos Dias M, Zeller R, Lounis S, Bl\u00fcgel S (2019) Complex magnetism of B20-MnGe: from spin-spirals, hedgehogs to monopoles. J Phys Condens Matter 31(48):485801. https:\/\/doi.org\/10.1088\/1361-648X\/ab38a0","journal-title":"J Phys Condens Matter"},{"key":"7145_CR12","doi-asserted-by":"publisher","unstructured":"Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK Users\u2019 Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA. https:\/\/doi.org\/10.1137\/1.9780898719604","DOI":"10.1137\/1.9780898719604"},{"issue":"2","key":"7145_CR13","doi-asserted-by":"publisher","first-page":"470","DOI":"10.1137\/0914029","volume":"14","author":"RW Freund","year":"1993","unstructured":"Freund RW (1993) A transpose-free quasi-minimal residual algorithm for non-Hermitian linear systems. SIAM J Sci Comput 14(2):470\u2013482. https:\/\/doi.org\/10.1137\/0914029","journal-title":"SIAM J Sci Comput"},{"issue":"1","key":"7145_CR14","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1007\/BF01385726","volume":"60","author":"RW Freund","year":"1991","unstructured":"Freund RW, Nachtigal NM (1991) QMR: a quasi-minimal residual method for non-Hermitian linear systems. Numerische Math 60(1):315\u2013339. https:\/\/doi.org\/10.1007\/BF01385726","journal-title":"Numerische Math"},{"issue":"1","key":"7145_CR15","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1145\/225545.225551","volume":"22","author":"RW Freund","year":"1996","unstructured":"Freund RW, Nachtigal NM (1996) QMRPACK: a package of QMR algorithms. ACM Trans Math Softw 22(1):46\u201377. https:\/\/doi.org\/10.1145\/225545.225551","journal-title":"ACM Trans Math Softw"},{"key":"7145_CR16","doi-asserted-by":"publisher","unstructured":"Kelley CT (1995) Iterative methods for linear and nonlinear equations. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. https:\/\/doi.org\/10.1137\/1.9781611970944","DOI":"10.1137\/1.9781611970944"},{"key":"7145_CR17","doi-asserted-by":"publisher","DOI":"10.1145\/3480935","author":"H Anzt","year":"2022","unstructured":"Anzt H, Cojean T, Flegar G, G\u00f6bel F, Gr\u00fctzmacher T, Nayak P, Ribizel T, Tsai YM, Quintana-Ort\u00ed ES (2022) Ginkgo: a modern linear operator algebra framework for high performance computing. ACM Trans Math Softw. https:\/\/doi.org\/10.1145\/3480935","journal-title":"ACM Trans Math Softw"},{"issue":"5\u20136","key":"7145_CR18","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.parco.2014.03.012","volume":"40","author":"U Borstnik","year":"2014","unstructured":"Borstnik U, VandeVondele J, Weber V, Hutter J (2014) Sparse matrix multiplication: the distributed block-compressed sparse row library. Parallel Comput 40(5\u20136):47\u201358","journal-title":"Parallel Comput"},{"key":"7145_CR19","unstructured":"OpenAI Blocksparse. GitHub. https:\/\/cdn.openai.com\/blocksparse\/blocksparsepaper.pdf"},{"key":"7145_CR20","unstructured":"cuSOLVER v12.8 (2025). https:\/\/docs.nvidia.com\/cuda\/pdf\/CUSOLVER_Library.pdf"},{"key":"7145_CR21","unstructured":"cuSPARSE, the CUDA sparse matrix library v12.8 (2025). https:\/\/docs.nvidia.com\/cuda\/cusparse\/"},{"key":"7145_CR22","doi-asserted-by":"publisher","unstructured":"Cheik Ahamed A-K, Magoul\u00e8s F (2012) Iterative methods for sparse linear systems on graphics processing unit. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, 836\u2013842. https:\/\/doi.org\/10.1109\/HPCC.2012.118","DOI":"10.1109\/HPCC.2012.118"},{"issue":"5","key":"7145_CR23","doi-asserted-by":"publisher","first-page":"1524","DOI":"10.1109\/TPDS.2023.3249110","volume":"34","author":"K Liegeois","year":"2023","unstructured":"Liegeois K, Rajamanickam S, Berger-Vergiat L (2023) Performance portable batched sparse linear solvers. IEEE Trans Parallel Distrib Syst 34(5):1524\u20131535. https:\/\/doi.org\/10.1109\/TPDS.2023.3249110","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"7145_CR24","unstructured":"NVIDIA-Corporation: (2017) NVIDIA Tesla V100 GPU Architecture, The World\u2019s Most Advanced Data Center GPU. Technical report. https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf"},{"issue":"4","key":"7145_CR25","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1143\/PTP.14.351","volume":"14","author":"T Matsubara","year":"1955","unstructured":"Matsubara T (1955) A new approach to quantum-statistical mechanics. Progr Theor Phys 14(4):351\u2013378. https:\/\/doi.org\/10.1143\/PTP.14.351","journal-title":"Progr Theor Phys"},{"issue":"144","key":"7145_CR26","doi-asserted-by":"publisher","first-page":"955","DOI":"10.1090\/S0025-5718-1978-0494848-1","volume":"32","author":"H Keller","year":"1978","unstructured":"Keller H, Pereyra V (1978) Symbolic generation of finite difference formulas. Math Comput 32(144):955\u2013971. https:\/\/doi.org\/10.1090\/S0025-5718-1978-0494848-1","journal-title":"Math Comput"},{"key":"7145_CR27","doi-asserted-by":"publisher","unstructured":"Baumeister PF, Hater T, Pleiter D, Boettiger H, Maurer T, Brunheroto JR (2017) Exploiting in-memory processing capabilities for density functional theory applications. In: Euro-Par 2016: Parallel Processing Workshops. Lecture Notes in Computer Science, vol 10104, pp 750\u2013762. Springer, Cham. Chap. 60. https:\/\/doi.org\/10.1007\/978-3-319-58943-5_60. https:\/\/juser.fz-juelich.de\/record\/830547","DOI":"10.1007\/978-3-319-58943-5_60"},{"key":"7145_CR28","unstructured":"Baumeister PF (2023) tfQMRgpu GitHub respository. https:\/\/github.com\/real-space\/tfQMRgpu Accessed 09 Sept 2023"},{"key":"7145_CR29","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.8333498","author":"P Baumeister","year":"2023","unstructured":"Baumeister P, Nassyr S (2023) Real-space\/tfQMRgpu: stable for reference publication. Zenodo. https:\/\/doi.org\/10.5281\/zenodo.8333498","journal-title":"Zenodo"},{"issue":"2","key":"7145_CR30","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1016\/j.laa.2011.05.019","volume":"436","author":"M Bolten","year":"2012","unstructured":"Bolten M, Thiess A, Yavneh I, Zeller R (2012) Preconditioning systems arising from the kkr green function method using block-circulant matrices. Linear Algebra Appl 436(2):436\u2013446. https:\/\/doi.org\/10.1016\/j.laa.2011.05.019","journal-title":"Linear Algebra Appl"},{"key":"7145_CR31","unstructured":"Yu R, Sturler E, Johnson DD (2002) A block iterative solver for complex non-hermitian systems applied to large-scale, electronic-structure calculations. Technical report, USA. https:\/\/dl.acm.org\/doi\/10.5555\/871118"},{"key":"7145_CR32","unstructured":"MPI-Forum: MPI (1994) A message-passing interface standard. Technical report, USA. https:\/\/www.mpi-forum.org\/docs\/mpi-3.1\/mpi31-report.pdf Accessed 09 Sept 2023"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-025-07145-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-025-07145-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-025-07145-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T03:56:13Z","timestamp":1743306973000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-025-07145-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,27]]},"references-count":32,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["7145"],"URL":"https:\/\/doi.org\/10.1007\/s11227-025-07145-6","relation":{},"ISSN":["1573-0484"],"issn-type":[{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,27]]},"assertion":[{"value":"1 March 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 March 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"663"}}