{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:28:03Z","timestamp":1768030083532,"version":"3.49.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,3,29]],"date-time":"2023-03-29T00:00:00Z","timestamp":1680048000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Department of Energy, National Nuclear Security Administration","award":["DE-NA0003963 and DE-NA0003966"],"award-info":[{"award-number":["DE-NA0003963 and DE-NA0003966"]}]},{"name":"National Science Foundation","award":["OCI0725070, and ACI-1238993"],"award-info":[{"award-number":["OCI0725070, and ACI-1238993"]}]},{"DOI":"10.13039\/100010548","name":"National Center for Supercomputing Applications","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100010548","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Department of Energy, National Nuclear Security Administration","award":["DE-NA0002374"],"award-info":[{"award-number":["DE-NA0002374"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>Krylov methods are a key way of solving large sparse linear systems of equations but suffer from poor strong scalability on distributed memory machines. This is due to high synchronization costs from large numbers of collective communication calls alongside a low computational workload. Enlarged Krylov methods address this issue by decreasing the total iterations to convergence, an artifact of splitting the initial residual and resulting in operations on block vectors. In this article, we present a performance study of an enlarged Krylov method, Enlarged Conjugate Gradients (ECG), noting the impact of block vectors on parallel performance at scale. Most notably, we observe the increased overhead of point-to-point communication as a result of denser messages in the sparse matrix-block vector multiplication kernel. Additionally, we present models to analyze expected performance of ECG, as well as motivate design decisions. Most importantly, we introduce a new point-to-point communication approach based on node-aware communication techniques that increases efficiency of the method at scale.<\/jats:p>","DOI":"10.1145\/3580003","type":"journal-article","created":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T12:07:21Z","timestamp":1673957241000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4938-6111","authenticated-orcid":false,"given":"Shelby","family":"Lockhart","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8891-934X","authenticated-orcid":false,"given":"Amanda","family":"Bienz","sequence":"additional","affiliation":[{"name":"University of New Mexico, Albuquerque, New Mexico, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2905-3029","authenticated-orcid":false,"given":"William","family":"Gropp","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5283-6104","authenticated-orcid":false,"given":"Luke","family":"Olson","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,3,29]]},"reference":[{"key":"e_1_3_1_2_2","volume-title":"Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium","author":"Agarwal Tarun","year":"2006","unstructured":"Tarun Agarwal, Amit Sharma, A. Laxmikant, and Laxmikant V. Kal\u00e9. 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, 10."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.3129"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1997.1346"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2020.06.009"},{"key":"e_1_3_1_6_2","first-page":"102","volume-title":"International Conference on High Performance Computing for Computational Science","author":"Baker Allison H.","year":"2010","unstructured":"Allison H. Baker, Martin Schulz, and Ulrike M. Yang. 2010. On the performance of an algebraic multigrid solver on multicore clusters. In International Conference on High Performance Computing for Computational Science. Springer, 102\u2013115."},{"key":"e_1_3_1_7_2","unstructured":"Satish Balay Shrirang Abhyankar Mark Adams Jed Brown Peter Brune Kris Buschelman Lisandro Dalcin Alp Dener Victor Eijkhout W. Gropp et\u00a0al. 2019. PETSc Users Manual ."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2019.03.016"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/1145\/3236367.3236368"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342020925535"},{"key":"e_1_3_1_11_2","unstructured":"Amanda Bienz and Luke N. Olson. 2017. RAPtor: Parallel Algebraic Multigrid v0.1 Release 0.1. Retrieved from https:\/\/github.com\/raptor-library\/raptor."},{"key":"e_1_3_1_12_2","first-page":"339","volume-title":"Contemporary High Performance Computing","author":"Bode Brett","year":"2013","unstructured":"Brett Bode, Michelle Butler, Thom Dunning, Torsten Hoefler, William Kramer, William Gropp, and Wen-mei Hwu. 2013. The blue waters super-system for super-science. In Contemporary High Performance Computing. Chapman & Hall\/CRC, 339\u2013366. https:\/\/www.taylorfrancis.com\/books\/e\/9781466568358."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1137\/120881191"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.780863"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1137\/080737770"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(89)90045-9"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2017.04.005"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/173284.155333"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2013.06.001"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1137\/140989492"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1137\/18M1196285"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/2966884.2966919"},{"issue":"3","key":"e_1_3_1_24_2","first-page":"1:1\u20131:10","article-title":"The CORAL supercomputer systems","volume":"64","author":"Hanson W. A.","year":"2020","unstructured":"W. A. Hanson. 2020. The CORAL supercomputer systems. IBM J. Res. Dev. 64, 3\/4 (2020), 1:1\u20131:10.","journal-title":"IBM J. Res. Dev."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(00)00048-X"},{"key":"e_1_3_1_26_2","first-page":"17","volume-title":"High Performance Parallel I\/O","author":"Kramer William","year":"2015","unstructured":"William Kramer, Michelle Butler, Gregory Bauer, Kalyana Chadalavada, and Celso Mendes. 2015. Blue waters parallel I\/O storage sub-system. In High Performance Parallel I\/O, Prabhat and Quincey Koziol (Eds.). CRC Publications, Taylor & Francis Group, 17\u201332."},{"key":"e_1_3_1_27_2","volume-title":"Communication-Avoiding Krylov Subspace Methods","author":"M. Hoemmen","year":"2010","unstructured":"Hoemmen M. 2010. Communication-Avoiding Krylov Subspace Methods. Ph.D. Dissertation. University of California, Berkeley."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.3609"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2013.10.001"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654096"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1137\/18M1182528"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/0024-3795(80)90247-5"},{"key":"e_1_3_1_33_2","first-page":"406","volume-title":"Proceedings of the International Conference on High Performance Computing & Simulation (HPCS\u201918)","author":"Page Brian A.","year":"2018","unstructured":"Brian A. Page and Peter M. Kogge. 2018. Scalability of hybrid sparse matrix dense vector (spmv) multiplication. In Proceedings of the International Conference on High Performance Computing & Simulation (HPCS\u201918). IEEE, 406\u2013414."},{"key":"e_1_3_1_34_2","first-page":"1","volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201902)","author":"Tr\u00e4ff Jesper Larsson","year":"2002","unstructured":"Jesper Larsson Tr\u00e4ff. 2002. Implementing the MPI process topology mechanism. In Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201902). IEEE, Los Alamitos, CA, 1\u201314."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0036144502409019"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580003","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580003","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:18Z","timestamp":1750182558000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580003"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,29]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3580003"],"URL":"https:\/\/doi.org\/10.1145\/3580003","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"value":"2329-4949","type":"print"},{"value":"2329-4957","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,29]]},"assertion":[{"value":"2022-03-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-10","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}