{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T07:12:30Z","timestamp":1761808350998,"version":"3.38.0"},"reference-count":41,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2020,2,13]],"date-time":"2020-02-13T00:00:00Z","timestamp":1581552000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"The research work of Byron E. Moutafis, as a PhD candidate, was funded by the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI).","award":["Byron E. Moutafis, Grant-Code: 1609"],"award-info":[{"award-number":["Byron E. Moutafis, Grant-Code: 1609"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2020,5]]},"abstract":"<jats:p> The state-of-the-art supercomputing infrastructures are equipped with accelerators, such as graphics processing units (GPUs), that operate as coprocessors for each workstation of the distributed memory system. The multi-projection type methods are a class of algebraic domain decomposition methods based on semi-aggregation techniques. The multi-projection type methods have improved convergence behavior, as the number of subdomains increases, due to the corresponding augmentation of the semi-aggregated local linear systems with more coarse components, while the number of fine components is reduced. Moreover, limited amount of communications among the workstations is required by the proposed method. The utilization of the available GPUs allows an increase in the number of subdomains along with finer-grained parallelism, leading to improved performance. A load-balancing algorithm that ensures the concurrency of the computations on multicore processors and GPUs is proposed. Flexible parallel preconditioned Krylov subspace iterative methods enhanced with multi-projection type methods have been designed appropriately in order to have improved performance, compared to CPU-only or GPU-only executions, by exploiting the available CPUs and GPUs of the distributed memory system concurrently. The unsymmetric local linear systems are solved by the preconditioned Bi-Conjugate Gradient STABilized (BiCGSTAB) method enhanced with the modified generic factored approximate sparse inverse preconditioner, whereas the preconditioned conjugate gradient (CG) method along with the symmetric factored approximate sparse inverse preconditioner is used for the symmetric positive definite local coefficient matrices. Numerical results regarding the convergence behavior, the performance, and the scalability of the proposed method for several problems are given. <\/jats:p>","DOI":"10.1177\/1094342020905637","type":"journal-article","created":{"date-parts":[[2020,2,14]],"date-time":"2020-02-14T02:58:30Z","timestamp":1581649110000},"page":"282-305","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Hybrid multi-projection method using sparse approximate inverses on GPU clusters"],"prefix":"10.1177","volume":"34","author":[{"given":"Byron E","family":"Moutafis","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1562-3633","authenticated-orcid":false,"given":"George A","family":"Gravvanis","sequence":"additional","affiliation":[]},{"given":"Christos K","family":"Filelis-Papadopoulos","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, School of Engineering University Campus, Democritus University of Thrace, Kimmeria, Xanthi, Greece"}]}],"member":"179","published-online":{"date-parts":[[2020,2,13]]},"reference":[{"key":"bibr1-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-48096-0_50"},{"key":"bibr2-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2017.05.006"},{"key":"bibr3-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015580139"},{"key":"bibr4-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063478"},{"key":"bibr5-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/140974717"},{"key":"bibr6-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827500381045"},{"key":"bibr7-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1007\/s11075-012-9605-7"},{"key":"bibr8-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/S106482759732678X"},{"key":"bibr9-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1017\/S0962492900002427"},{"key":"bibr10-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/S106482759833913X"},{"key":"bibr11-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1177\/109434200101500106"},{"issue":"1","key":"bibr12-1094342020905637","first-page":"1","volume":"38","author":"Davis TA","year":"2011","journal-title":"ACM Transactions on Mathematical Software (TOMS)"},{"key":"bibr13-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1002\/1099-1506(200010\/12)7:7\/8<687::AID-NLA219>3.0.CO;2-S"},{"key":"bibr14-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1016\/j.cam.2013.07.049"},{"key":"bibr15-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1108\/EC-12-2014-0261"},{"key":"bibr16-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827597323415"},{"key":"bibr17-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-011-0619-z"},{"key":"bibr18-1094342020905637","doi-asserted-by":"publisher","DOI":"10.6028\/jres.049.044"},{"volume-title":"Communication-Avoiding Krylov Subspace Methods","year":"2010","author":"Hoemmen M","key":"bibr19-1094342020905637"},{"key":"bibr20-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"bibr21-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/0614004"},{"key":"bibr22-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1142\/S0219876218500500"},{"key":"bibr23-1094342020905637","first-page":"45","volume-title":"2nd International workshop on GPUs and scientific applications (GPUSCA 2011)","author":"Luo L","year":"2011"},{"key":"bibr24-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-2437-5_8"},{"key":"bibr25-1094342020905637","first-page":"224","volume":"6","author":"Mitchell WF","year":"1997","journal-title":"Electronic Transactions on Numerical Analysis"},{"key":"bibr26-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2017.08.020"},{"key":"bibr27-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1155\/2017\/2580820"},{"key":"bibr28-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/04061129X"},{"issue":"6","key":"bibr29-1094342020905637","first-page":"123","volume":"37","author":"Notay Y","year":"2010","journal-title":"Electronic Transactions on Numerical Analysis"},{"volume-title":"In proceedings of the XXVIII international symposium on lattice field TheoryPoS, LATTICE2010 036","year":"2010","author":"Osaki Y","key":"bibr30-1094342020905637"},{"key":"bibr31-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1016\/j.cma.2011.01.013"},{"key":"bibr32-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/0914028"},{"key":"bibr33-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"bibr34-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2003.07.011"},{"key":"bibr35-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1007\/BF02165096"},{"volume-title":"Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations","year":"2004","author":"Smith B","key":"bibr36-1094342020905637"},{"key":"bibr37-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1007\/b137868"},{"key":"bibr38-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1137\/0913035"},{"key":"bibr39-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.48"},{"key":"bibr40-1094342020905637","first-page":"933","volume-title":"Proceedings of the international conference for high performance computing, networking, storage and analysis","author":"Yamazaki I","year":"2014"},{"key":"bibr41-1094342020905637","doi-asserted-by":"publisher","DOI":"10.1016\/j.cam.2016.08.033"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342020905637","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342020905637","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342020905637","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T08:51:25Z","timestamp":1740905485000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342020905637"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,13]]},"references-count":41,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,5]]}},"alternative-id":["10.1177\/1094342020905637"],"URL":"https:\/\/doi.org\/10.1177\/1094342020905637","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2020,2,13]]}}}