{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,27]],"date-time":"2026-06-27T07:13:04Z","timestamp":1782544384141,"version":"3.54.5"},"reference-count":49,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T00:00:00Z","timestamp":1657152000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100022273","name":"Gauss Centre for Supercomputing e.V","doi-asserted-by":"crossref","award":["pr83te"],"award-info":[{"award-number":["pr83te"]}],"id":[{"id":"10.13039\/501100022273","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Competence Network for Scientific High Performance Computing in Bavaria","award":["High-order matrix-free finite element implementati"],"award-info":[{"award-number":["High-order matrix-free finite element implementati"]}]},{"name":"Competence Network for Scientific High Performance Computing in Bavaria","award":["High-order matrix-free finite element implementati"],"award-info":[{"award-number":["High-order matrix-free finite element implementati"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2023,3]]},"abstract":"<jats:p>This work investigates a variant of the conjugate gradient (CG) method and embeds it into the context of high-order finite-element schemes with fast matrix-free operator evaluation and cheap preconditioners like the matrix diagonal. Relying on a data-dependency analysis and appropriate enumeration of degrees of freedom, we interleave the vector updates and inner products in a CG iteration with the matrix-vector product with only minor organizational overhead. As a result, around 90% of the vector entries of the three active vectors of the CG method are transferred from slow RAM memory exactly once per iteration, with all additional access hitting fast cache memory. Node-level performance analyses and scaling studies on up to 147k cores show that the CG method with the proposed performance optimizations is around two times faster than a standard CG solver as well as optimized pipelined CG and s-step CG methods for large sizes that exceed processor caches, and provides similar performance near the strong scaling limit.<\/jats:p>","DOI":"10.1177\/10943420221107880","type":"journal-article","created":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T21:22:15Z","timestamp":1657228935000},"page":"61-81","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":15,"title":["Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations"],"prefix":"10.1177","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8406-835X","authenticated-orcid":false,"given":"Martin","family":"Kronbichler","sequence":"first","affiliation":[{"name":"Institute for Computational Mechanics, Technical University of Munich, Germany"},{"name":"Department of Mathematics, University of Augsburg, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dmytro","family":"Sashko","sequence":"additional","affiliation":[{"name":"School of Mechanical and Mining Engineering, The University of Queensland, Saint Lucia, QLD, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter","family":"Munch","sequence":"additional","affiliation":[{"name":"Institute for Computational Mechanics, Technical University of Munich, Germany"},{"name":"Institute of Material Systems Modeling, Helmholtz-Zentrum Hereon, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2022,7,7]]},"reference":[{"key":"bibr1-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.41"},{"key":"bibr2-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1515\/jnma-2020-0043"},{"key":"bibr3-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2020.02.022"},{"key":"bibr4-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-47956-5_8"},{"key":"bibr5-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/2049673.2049678"},{"key":"bibr6-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1137\/17m1148384"},{"key":"bibr7-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1007\/s10915-010-9396-8"},{"key":"bibr8-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1137\/10079163"},{"key":"bibr9-10943420221107880","unstructured":"Chalmers N, Warburton T (2020) Portable high-order finite element Kernels I: streaming operations. preprint ArXiv: 2009.10917."},{"key":"bibr10-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019842645"},{"key":"bibr11-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(89)90045-9"},{"key":"bibr12-10943420221107880","unstructured":"Cornelis J, Cools S, Vanroose W (2018) The communication-hiding conjugate gradient method with deep pipelines. ArXiv e-prints 1801.4728v3."},{"key":"bibr13-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1109\/TMAG.2010.2081662"},{"key":"bibr14-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/0045-7825(90)90081-v"},{"key":"bibr15-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511546792"},{"key":"bibr16-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1137\/0902001"},{"key":"bibr17-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330358"},{"key":"bibr18-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1002\/fld.4511"},{"key":"bibr19-10943420221107880","unstructured":"Fischer P, Kerkemeier S, Peplinski A, et al. (2021) Nek5000 Web page. https:\/\/nek5000.mcs.anl.gov"},{"key":"bibr20-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1177\/1094342020915762"},{"key":"bibr21-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2013.06.001"},{"key":"bibr22-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1137\/18M1196285"},{"key":"bibr23-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/j.cma.2020.113608"},{"key":"bibr24-10943420221107880","volume-title":"Introduction to High Performance Computing for Scientists and Engineers","author":"Hager G","year":"2011"},{"key":"bibr25-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/3424144"},{"key":"bibr26-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1177\/10943420211020803"},{"key":"bibr27-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2017.07.039"},{"key":"bibr28-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-99654-7_7"},{"key":"bibr29-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1177\/1094342016671790"},{"key":"bibr30-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2012.04.012"},{"key":"bibr31-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/3325864"},{"key":"bibr32-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/3322813"},{"key":"bibr33-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1137\/16M110455X"},{"key":"bibr34-10943420221107880","first-page":"1","volume-title":"HPC \u201917: Proceedings of the 25th High Performance Computing Symposium","author":"Ljungkvist K","year":"2017"},{"key":"bibr35-10943420221107880","doi-asserted-by":"crossref","unstructured":"Lockhart S, Bienz A, Gropp W, et al. (2022) Performance analysis and optimal node-aware communication for enlarged conjugate gradient methods. arXiv preprint arXiv:2203.06144.","DOI":"10.1145\/3580003"},{"key":"bibr36-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/3155290"},{"key":"bibr37-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1109\/TMAG.2013.2244861"},{"key":"bibr38-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1137\/19m1246523"},{"key":"bibr39-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/3469720"},{"key":"bibr40-10943420221107880","volume-title":"S-step and communication-avoiding iterative methods","author":"Naumov M","year":"2016"},{"key":"bibr41-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(80)90005-4"},{"key":"bibr42-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(84)90128-1"},{"key":"bibr43-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1145\/2907944"},{"key":"bibr44-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1137\/0906059"},{"key":"bibr45-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(92)90182-X"},{"key":"bibr46-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1177\/1094342020945005"},{"key":"bibr47-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1177\/1094342018816368"},{"key":"bibr48-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2010.38"},{"key":"bibr49-10943420221107880","doi-asserted-by":"publisher","DOI":"10.1109\/SC.1999.10035"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221107880","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420221107880","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221107880","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:17:22Z","timestamp":1777450642000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420221107880"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,7]]},"references-count":49,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10.1177\/10943420221107880"],"URL":"https:\/\/doi.org\/10.1177\/10943420221107880","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,7]]}}}