{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T22:52:16Z","timestamp":1777675936839,"version":"3.51.4"},"reference-count":31,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2020,12,28]],"date-time":"2020-12-28T00:00:00Z","timestamp":1609113600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2021,3]]},"abstract":"<jats:p>Point-block matrices arise naturally in multiphysics problems when all variables associated with a mesh point are ordered together, and are different from the general block matrices since the sizes of the blocks are so small one can often invert some of the diagonal blocks explicitly. Motivated by the recent works of Chow and Patel and Chow et al., we propose an efficient incomplete LU (ILU) preconditioner for point-block matrices targeting applications on GPU. The construction of the preconditioner involves two critical steps: (1) the initial guessing of values for the lower and upper triangular matrices; and (2) several sweeps of asynchronous updating of the triangular matrices. Three representative problems are studied to show the advantage of the proposed point-block approach over the standard point-wise approach in terms of the number of GMRES iterations and also the total compute time. Moreover, we compare the proposed algorithm with the level-scheduling based parallel algorithm employed in NVIDIA\u2019s cuSPARSE library as well as the serial method implemented in Intel MKL library, and the experiments show that a 2\u00d7\u20135\u00d7 speedup can be achieved over the block-based ILU( p) factorizations from the cuSPARSE library.<\/jats:p>","DOI":"10.1177\/1094342020981153","type":"journal-article","created":{"date-parts":[[2020,12,28]],"date-time":"2020-12-28T04:25:27Z","timestamp":1609129527000},"page":"121-135","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["Point-block incomplete LU preconditioning with asynchronous iterations on GPU for multiphysics problems"],"prefix":"10.1177","volume":"35","author":[{"given":"Wenpeng","family":"Ma","sequence":"first","affiliation":[{"name":"School of Computer and Information Technology, Xinyang Normal University, Henan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0296-8640","authenticated-orcid":false,"given":"Xiao-Chuan","family":"Cai","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Macau, Macau, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2020,12,28]]},"reference":[{"key":"bibr1-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2016.12.009"},{"key":"bibr2-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.3874"},{"key":"bibr3-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1142\/S0129053389000056"},{"key":"bibr4-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2017.05.006"},{"key":"bibr5-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1007\/BF01932750"},{"key":"bibr6-1094342020981153","unstructured":"Balay S, Abhyankar S, Adams MF, et al. (2020) PETSc web page. Available at: https:\/\/www.mcs.anl.gov\/petsc (accessed 25 October 2020)."},{"key":"bibr7-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-017-2764-7"},{"key":"bibr8-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1137\/140968896"},{"key":"bibr9-1094342020981153","first-page":"1","volume":"9137","author":"Chow E","year":"2015","journal-title":"ISC HIGH PERFORMANCE 2015, Lecture notes in computer science"},{"key":"bibr10-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049670"},{"key":"bibr11-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2016.42"},{"key":"bibr12-1094342020981153","unstructured":"Intel Math Kernel Library Documentation (2017) Available at: https:\/\/software.intel.com\/en-us\/articles\/intel-math-kernel-library-documentation (accessed 30 October 2020)."},{"key":"bibr13-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1016\/S0024-3795(00)00146-4"},{"key":"bibr14-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1177\/1094342016646437"},{"key":"bibr15-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-012-0825-3"},{"key":"bibr16-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2020.109312"},{"key":"bibr17-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2015.07.005"},{"key":"bibr18-1094342020981153","doi-asserted-by":"publisher","DOI":"10.2514\/6.2015-3055"},{"key":"bibr19-1094342020981153","unstructured":"Nguyen Loc Q (2017) Quick start guide for Intel Xeon Phi processor x200 product family. Available at: https:\/\/software.intel.com\/en-us\/articles\/quick-start-guide-for-the-intel-xeon-phi-processor-x200-product-family (accessed 1 November 2020)."},{"key":"bibr20-1094342020981153","unstructured":"NVIDIA cuSPARSE library (2014) Available at: https:\/\/developer.nvidia.com\/cuda-toolkit-65 (accessed 10 March 2020)."},{"key":"bibr21-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(97)00026-4"},{"key":"bibr22-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1137\/0724090"},{"key":"bibr23-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2016.06.004"},{"key":"bibr24-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1137\/15M1026419"},{"key":"bibr25-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"bibr26-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1137\/S106482759732753X"},{"key":"bibr27-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479898341268"},{"issue":"2","key":"bibr28-1094342020981153","first-page":"1","volume":"4","author":"Yang B","year":"2015","journal-title":"Journal of Geology & Geophysics"},{"key":"bibr29-1094342020981153","volume-title":"27th IEEE international parallel & distributed processing symposium workshops & PhD forum (IPDPSW\u201913)","author":"Yang C","year":"2014"},{"key":"bibr30-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-05789-7_79"},{"key":"bibr31-1094342020981153","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022328131952"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342020981153","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342020981153","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342020981153","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:17:11Z","timestamp":1777450631000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342020981153"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,28]]},"references-count":31,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,3]]}},"alternative-id":["10.1177\/1094342020981153"],"URL":"https:\/\/doi.org\/10.1177\/1094342020981153","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,28]]}}}