{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T18:52:37Z","timestamp":1775501557778,"version":"3.50.1"},"reference-count":72,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2025,2,23]],"date-time":"2025-02-23T00:00:00Z","timestamp":1740268800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100007185","name":"TotalEnergies","doi-asserted-by":"publisher","award":["LLNL-JRNL-813686"],"award-info":[{"award-number":["LLNL-JRNL-813686"]}],"id":[{"id":"10.13039\/501100007185","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006227","name":"Lawrence Livermore National Laboratory","doi-asserted-by":"publisher","award":["DE-AC52-07NA2734"],"award-info":[{"award-number":["DE-AC52-07NA2734"]}],"id":[{"id":"10.13039\/100006227","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2025,5]]},"abstract":"<jats:p>This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures, specifically, those that are equipped with graphic processing units (GPUs). In addition to block-Jacobi, we present general purpose two-level ILU Schur complement-based approaches, where different strategies are presented to solve the coarse-level reduced system. These strategies are combined with modified ILU methods in the construction of the coarse-level operator, in order to effectively remove smooth errors by targeting an algebraically smooth vector. We leverage available GPU-based sparse matrix kernels to accelerate the setup and the solve phases of the proposed ILU preconditioner. We evaluate the efficiency of the proposed methods as a smoother for algebraic multigrid (AMG) and as a preconditioner for Krylov subspace methods on challenging anisotropic diffusion problems and a collection of general sparse matrices.<\/jats:p>","DOI":"10.1177\/10943420251319334","type":"journal-article","created":{"date-parts":[[2025,2,24]],"date-time":"2025-02-24T06:49:14Z","timestamp":1740379754000},"page":"424-442","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":3,"title":["A two-level GPU-accelerated incomplete LU preconditioner for general sparse linear systems"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3119-1957","authenticated-orcid":false,"given":"Tianshi","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Mathematics, Emory University, Atlanta, GA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2802-5763","authenticated-orcid":false,"given":"Rui Peng","family":"Li","sequence":"additional","affiliation":[{"name":"Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6111-6205","authenticated-orcid":false,"given":"Daniel","family":"Osei-Kuffuor","sequence":"additional","affiliation":[{"name":"Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA"}]}],"member":"179","published-online":{"date-parts":[[2025,2,23]]},"reference":[{"key":"e_1_3_3_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2020.06.009"},{"key":"e_1_3_3_3_1","first-page":"75","volume-title":"Proceedings of the Symposium on High Performance Computing, HPC \u201915","author":"Anzt H","year":"2015","unstructured":"Anzt H, Tomov S, Dongarra J (2015) Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product. In: Proceedings of the Symposium on High Performance Computing, HPC \u201915. San Diego, CA, USA: Society for Computer Simulation International, pp. 75\u201382."},{"key":"e_1_3_3_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2017.06.016"},{"key":"e_1_3_3_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/100798806"},{"key":"e_1_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002110050430"},{"key":"e_1_3_3_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479897319301"},{"key":"e_1_3_3_8_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1990-1023042-6"},{"key":"e_1_3_3_9_1","doi-asserted-by":"publisher","DOI":"10.1137\/090772216"},{"key":"e_1_3_3_10_1","doi-asserted-by":"publisher","DOI":"10.1080\/17445760802337010"},{"key":"e_1_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cma.2021.114111"},{"key":"e_1_3_3_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/S106482759732678X"},{"key":"e_1_3_3_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827598340809"},{"key":"e_1_3_3_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.780863"},{"key":"e_1_3_3_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cam.2016.03.012"},{"key":"e_1_3_3_16_1","doi-asserted-by":"publisher","DOI":"10.1137\/140968896"},{"key":"e_1_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-0427(97)00171-4"},{"key":"e_1_3_3_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2010.05.002"},{"key":"e_1_3_3_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/17M1143320"},{"key":"e_1_3_3_20_1","doi-asserted-by":"publisher","DOI":"10.4208\/cicp.OA-2016-0168"},{"key":"e_1_3_3_21_1","doi-asserted-by":"publisher","DOI":"10.1137\/040615195"},{"key":"e_1_3_3_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apnum.2005.04.039"},{"key":"e_1_3_3_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0036142903429742"},{"key":"e_1_3_3_24_1","doi-asserted-by":"crossref","unstructured":"Falgout RD Yang UM (2002) hypre: a library of high performance preconditioners. In: International Conference on Computational Science Amsterdam The Netherlands 21\u201324 April 2002 pp. 632\u2013641 Springer.","DOI":"10.1007\/3-540-47789-6_66"},{"key":"e_1_3_3_25_1","doi-asserted-by":"publisher","DOI":"10.1002\/nme.76"},{"key":"e_1_3_3_26_1","doi-asserted-by":"crossref","unstructured":"Gaidamour J H\u00e9non P (2008) HIPS: a parallel hybrid direct\/iterative solver based on a Schur complement approach. In: Sparse Days at CERFACS Workshop of Vecpar 08 Toulouse France 23\u201324 June 2008.","DOI":"10.1109\/CSE.2008.36"},{"key":"e_1_3_3_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2014.08.022"},{"key":"e_1_3_3_28_1","doi-asserted-by":"publisher","DOI":"10.11588\/emclpp.2017.06.42879"},{"key":"e_1_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1137\/0710032"},{"key":"e_1_3_3_30_1","doi-asserted-by":"publisher","DOI":"10.1137\/0713023"},{"key":"e_1_3_3_31_1","doi-asserted-by":"publisher","DOI":"10.1137\/20M1344913"},{"key":"e_1_3_3_32_1","volume-title":"The Chaco User\u2019s Guide Version 2","author":"Hendrickson B","year":"1994","unstructured":"Hendrickson B, Leland R (1994) The Chaco User\u2019s Guide Version 2. Albuquerque NM: Sandia National Laboratories. ftp:\/\/ftp.cs.sandia.gov\/pub\/papers\/bahendr\/guide.ps.gz"},{"key":"e_1_3_3_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331561"},{"key":"e_1_3_3_34_1","doi-asserted-by":"crossref","unstructured":"Karypis G Kumar V (1997) Parallel threshold-based ILU factorization. In: SC\u201997: Proceedings of the 1997 ACM\/IEEE Conference on Supercomputing San Jose CA USA 15\u201321 November 1997 p. 28 IEEE.","DOI":"10.1145\/509593.509621"},{"key":"e_1_3_3_35_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_3_3_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0018528"},{"key":"e_1_3_3_37_1","doi-asserted-by":"publisher","unstructured":"Kolev T Dobrev V (2010) GLVis: opengl finite element visualization tool. glvis.org. DOI: 10.11578\/dc.20171025.1249.","DOI":"10.11578\/dc.20171025.1249"},{"key":"e_1_3_3_38_1","doi-asserted-by":"publisher","unstructured":"Kolev T Dobrev V USDOE (2010) MFEM: modular finite element methods [Software]. mfem.org. DOI: 10.11578\/dc.20171025.1248. https:\/\/www.osti.gov\/biblio\/1617672","DOI":"10.11578\/dc.20171025.1248"},{"key":"e_1_3_3_39_1","unstructured":"Labs P (2016) Paralution v1.1.0. https:\/\/www.paralution.com\/"},{"key":"e_1_3_3_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132402.3132415"},{"key":"e_1_3_3_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-012-0825-3"},{"key":"e_1_3_3_42_1","doi-asserted-by":"publisher","DOI":"10.1002\/nla.325"},{"key":"e_1_3_3_43_1","doi-asserted-by":"publisher","DOI":"10.1137\/18M1170935"},{"key":"e_1_3_3_44_1","doi-asserted-by":"publisher","DOI":"10.1137\/18M1228128"},{"key":"e_1_3_3_45_1","doi-asserted-by":"crossref","unstructured":"Luo L Zhao Y Cai XC (2012) A hybrid implementation of two-level domain decomposition algorithm for solving elliptic equation on CPU\/GPUs. In: 2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies Beijing China 14-16 December 2012 pp. 474\u2013477 IEEE.","DOI":"10.1109\/PDCAT.2012.18"},{"key":"e_1_3_3_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apnum.2009.09.003"},{"key":"e_1_3_3_47_1","doi-asserted-by":"publisher","DOI":"10.1002\/nla.341"},{"key":"e_1_3_3_48_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1980-0559197-0"},{"key":"e_1_3_3_49_1","unstructured":"Naumov M (2011) Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. NVIDIA Corp. Westford MA USA Tech. Rep. NVR-2011 1."},{"key":"e_1_3_3_50_1","doi-asserted-by":"publisher","DOI":"10.1137\/140980260"},{"key":"e_1_3_3_51_1","unstructured":"Naumov M Castonguay P Cohen J (2015b) Parallel graph coloring with applications to the incomplete-LU factorization on the GPU. Nvidia White Paper."},{"key":"e_1_3_3_52_1","doi-asserted-by":"publisher","DOI":"10.5540\/tema.2018.019.01.59"},{"key":"e_1_3_3_53_1","unstructured":"Pellegrini F (2010) Scotch and libScotch 5.1 user\u2019s guide. INRIA Bordeaux Sud-Ouest IPB & LaBRI UMR CNRS 5800. https:\/\/gforge.inria.fr\/docman\/view.php\/248\/7104\/scotch_user5.1.pdf"},{"key":"e_1_3_3_54_1","doi-asserted-by":"publisher","DOI":"10.1137\/0611030"},{"key":"e_1_3_3_55_1","doi-asserted-by":"crossref","unstructured":"Rajamanickam S Boman EG Heroux MA (2012) Shylu: a hybrid-hybrid solver for multicore platforms. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Shanghai China 21\u201325 May 2012 pp. 631\u2013643 IEEE.","DOI":"10.1109\/IPDPS.2012.64"},{"key":"e_1_3_3_56_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2016.06.004"},{"key":"e_1_3_3_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMAG.2013.2283099"},{"key":"e_1_3_3_58_1","doi-asserted-by":"publisher","DOI":"10.1137\/15M1026419"},{"key":"e_1_3_3_59_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"e_1_3_3_60_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827597328996"},{"key":"e_1_3_3_61_1","doi-asserted-by":"publisher","DOI":"10.1137\/S089547989834126"},{"key":"e_1_3_3_62_1","doi-asserted-by":"crossref","unstructured":"Sao P Vuduc R Li XS (2014) A distributed CPU-GPU sparse direct solver. In: European Conference on Parallel Processing Porto Portugal 25\u201329 August 2014 Springer 487\u2013498.","DOI":"10.1007\/978-3-319-09873-9_41"},{"key":"e_1_3_3_63_1","doi-asserted-by":"publisher","DOI":"10.1177\/10943420221136873"},{"key":"e_1_3_3_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-013-0576-y"},{"key":"e_1_3_3_65_1","doi-asserted-by":"publisher","DOI":"10.1137\/060661491"},{"key":"e_1_3_3_66_1","doi-asserted-by":"publisher","DOI":"10.1002\/nla.1928"},{"key":"e_1_3_3_67_1","doi-asserted-by":"publisher","unstructured":"Wallis JR (1983) Incomplete Gaussian elimination as a preconditioning for generalized conjugate gradient acceleration. In: SPE Reservoir Simulation Symposium San Francisco California November 1983 Society of Petroleum Engineers. DOI: 10.2118\/12265-ms.","DOI":"10.2118\/12265-ms"},{"key":"e_1_3_3_68_1","doi-asserted-by":"publisher","unstructured":"Wallis JR Kendall RP Little LE (1985) Constrained residual acceleration of conjugate residual methods. In: SPE Reservoir Simulation Symposium Dallas Texas February 1985 Society of Petroleum Engineers (SPE). DOI: 10.2118\/13536-MS.","DOI":"10.2118\/13536-MS"},{"key":"e_1_3_3_69_1","doi-asserted-by":"crossref","unstructured":"Wang M Klie H Parashar M et al. (2009) Solving sparse linear systems on nvidia tesla GPUs. In: International Conference on Computational Science Baton Rouge LA USA 25\u201327 May 2009 pp. 864\u2013873 Springer.","DOI":"10.1007\/978-3-642-01970-8_87"},{"key":"e_1_3_3_70_1","doi-asserted-by":"publisher","unstructured":"Wang L Osei-Kuffuor D Falgout RD et al. (2017) Multigrid reduction for coupled flow problems with application to reservoir simulation. In: SPE Reservoir Simulation Conference Montgomery Texas USA February 2017 Society of Petroleum Engineers (SPE). DOI: 10.2118\/182723-MS.","DOI":"10.2118\/182723-MS"},{"key":"e_1_3_3_71_1","doi-asserted-by":"publisher","DOI":"10.1137\/16M1078409"},{"key":"e_1_3_3_72_1","doi-asserted-by":"crossref","unstructured":"Yamazaki I (2011) Pdslin user guide.","DOI":"10.2172\/1050673"},{"key":"e_1_3_3_73_1","doi-asserted-by":"crossref","unstructured":"Yamazaki I Heinlein A Rajamanickam S (2023) An experimental study of two-level schwarz domain-decomposition preconditioners on gpus. In: 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 15\u201319 May 2023 St. Petersburg FL USA pp. 680\u2013689 IEEE.","DOI":"10.1109\/IPDPS54959.2023.00073"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251319334","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420251319334","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251319334","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251319334","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,25]],"date-time":"2025-05-25T04:18:25Z","timestamp":1748146705000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420251319334"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,23]]},"references-count":72,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,5]]}},"alternative-id":["10.1177\/10943420251319334"],"URL":"https:\/\/doi.org\/10.1177\/10943420251319334","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,23]]}}}