{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T23:14:43Z","timestamp":1776122083874,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272474"],"award-info":[{"award-number":["62272474"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>\n            Sparse iterative solvers are commonly used in various fields. However, certain essential kernels of these solvers, such as sparse triangular solves (SpTRSV), present significant challenges for efficient parallelization due to\n            <jats:italic toggle=\"yes\">data dependencies<\/jats:italic>\n            . Previous methods, like level-scheduling or multi-coloring, typically involve creating a Task Dependency Graph (TDG) to represent data dependencies and identify independent sets from the TDG for parallel execution. However, these approaches often result in limited parallelism with substantial synchronization overheads or negatively impact the solver convergence rate.\n          <\/jats:p>\n          <jats:p>\n            This article introduces\n            <jats:italic toggle=\"yes\">DCSolver<\/jats:italic>\n            , a Divide-and-Conquer (DC) framework designed to efficiently parallelize sparse solvers with data dependencies on GPUs. To achieve this, we break down the solver TDG into independent subgraphs, allowing us to exploit both coarse-grained and fine-grained parallelism. To efficiently allocate GPU threads for subgraphs with varying degrees of parallelism, we have developed an adaptive in-warp scheduling strategy. Additionally, we propose a\n            <jats:italic toggle=\"yes\">hybrid<\/jats:italic>\n            parallelization scheme in DCSolver, which involves employing different parallel approaches for different DC recursions to achieve a more optimal balance between parallelism and convergence for solvers. To evaluate the effectiveness of DCSolver, we apply it to two preconditioned Krylov subspace solvers and an unstructured mesh Computational Fluid Dynamics (CFD) solver. Our results show that when compared with the state-of-the-art methods, DCSolver accelerates the time-to-solution of solvers by an average speedup of up to 26.19X.\n          <\/jats:p>","DOI":"10.1145\/3746233","type":"journal-article","created":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T07:02:02Z","timestamp":1750921322000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["DCSolver: Accelerating Sparse Iterative Solvers via Divide-and-Conquer on GPUs"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-0434-8075","authenticated-orcid":false,"given":"Haozhong","family":"Qiu","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4876-2368","authenticated-orcid":false,"given":"Chuanfu","family":"Xu","sequence":"additional","affiliation":[{"name":"Laboratory of Digitizing Software for Frontier Equipment, College of Computer Science and Technology, National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3542-4869","authenticated-orcid":false,"given":"Jianbin","family":"Fang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7676-7609","authenticated-orcid":false,"given":"Jian","family":"Zhang","sequence":"additional","affiliation":[{"name":"China Aerodynamics Research and Development Center","place":["Mianyang, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1444-4588","authenticated-orcid":false,"given":"Liang","family":"Deng","sequence":"additional","affiliation":[{"name":"China Aerodynamics Research and Development Center","place":["Mianyang, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0885-2845","authenticated-orcid":false,"given":"Zhe","family":"Dai","sequence":"additional","affiliation":[{"name":"China Aerodynamics Research and Development Center","place":["Mianyang, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-1025-5238","authenticated-orcid":false,"given":"Yue","family":"Ding","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-7571-7048","authenticated-orcid":false,"given":"Yue","family":"Wang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-9522-601X","authenticated-orcid":false,"given":"Zhimeng","family":"Han","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6906-4940","authenticated-orcid":false,"given":"Yonggang","family":"Che","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3745-7541","authenticated-orcid":false,"given":"Jie","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer, National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,19]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0129053389000056"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1631"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/C2013-0-19038-1"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0965-9978(96)00039-7"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.7527\/S1000-6893.2021.25739"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(99)00064-2"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.5555\/646667.699892"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","unstructured":"Assefaw H. Gebremedhin Duc Nguyen Md. Mostofa Ali Patwary and Alex Pothen. 2013. ColPack: Software for graph coloring and related problems in scientific computing. 40 1 Article 1 (oct2013) 31 pages. DOI:10.1145\/2513109.2513110","DOI":"10.1145\/2513109.2513110"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1137\/0710032"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.2514\/6.2021-0855"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.68"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42514-020-00030-z"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.51"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1021738303840"},{"key":"e_1_3_2_17_2","first-page":"83","volume-title":"Proc. of Copper Mountain Conference on Iterative Methods","volume":"2","author":"Jones Mark T","year":"1992","unstructured":"Mark T Jones and Paul E Plassmann. 1992. The effect of many color orderings on the convergence of iterative methods. In Proc. of Copper Mountain Conference on Iterative Methods, Vol. 2. Citeseer, 83\u201393."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807667"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_3_2_20_2","first-page":"225","article-title":"Multi-threaded graph partitioning","author":"LaSalle Dominique","year":"2013","unstructured":"Dominique LaSalle and George Karypis. 2013. Multi-threaded graph partitioning. International Symposium on Parallel and Distributed Processing (2013), 225\u2013236. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:5559924","journal-title":"International Symposium on Parallel and Distributed Processing"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-016-1943-0"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-012-0825-3"},{"key":"e_1_3_2_23_2","article-title":"Experimental investigation of the CHN-T1 model in FL-13 wind tunnel of CARDC and DNW-LLF facility","author":"Litao Fan","year":"2020","unstructured":"Fan Litao and Zhang Hui. 2020. Experimental investigation of the CHN-T1 model in FL-13 wind tunnel of CARDC and DNW-LLF facility. Congress of the International Council of the Areonautical Sciences (2020).","journal-title":"Congress of the International Council of the Areonautical Sciences"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43659-3_45"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2620142"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.2514\/6.1997-2010"},{"key":"e_1_3_2_27_2","article-title":"Graph coloring problems and their applications in scheduling","volume":"48","author":"Marx D","year":"2003","unstructured":"D Marx. 2003. Graph coloring problems and their applications in scheduling. Periodica Polytechnica, Electrical Engineering 48 (102003).","journal-title":"Periodica Polytechnica, Electrical Engineering"},{"key":"e_1_3_2_28_2","unstructured":"Maxim Naumov. 2012. Parallel Incomplete-LU and cholesky factorization in the preconditioned iterative methods on the GPU. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:10340972"},{"key":"e_1_3_2_29_2","volume-title":"Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU","author":"Naumov Maxim","year":"2015","unstructured":"Maxim Naumov, Patrice Castonguay, and Jonathan Cohen. 2015. Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU. Technical Report Technical Report NVR-2015-001. NVIDIA."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07518-1_8"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.82"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1002\/gamm.202000015"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627535.3638473"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503287"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3432261.3432271"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/3037529.3037535"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2013.10.008"},{"key":"e_1_3_2_39_2","volume-title":"Proc. of Symposium on CYBER 205 Applications","author":"Schreiber Robert","year":"1982","unstructured":"Robert Schreiber and W. Tang. 1982. Vectorizing the conjugate gradient method. In Proc. of Symposium on CYBER 205 Applications."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.2514\/2.392"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404397.3404400"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2012.23"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688517"},{"key":"e_1_3_2_44_2","article-title":"Cusparse library","author":"Vandermersch M. Naumov L. Chien P.","year":"2010","unstructured":"M. Naumov L. Chien P. Vandermersch and U. Kapasi. 2010. Cusparse library. GPU Technology Conference (GTC) (2010).","journal-title":"GPU Technology Conference (GTC)"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3085578"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.2514\/3.10007"},{"key":"e_1_3_2_47_2","article-title":"Parallelization of gauss-seidel relaxation for real gas flow","author":"Yoon Seokkwan","year":"2005","unstructured":"Seokkwan Yoon, Gabriele Jost, and Sherry Chang. 2005. Parallelization of gauss-seidel relaxation for real gas flow. NAS Technical Report, NAS-05-011, Tech. Rep. (2005).","journal-title":"NAS Technical Report, NAS-05-011, Tech. Rep."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3066635"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746233","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T00:48:30Z","timestamp":1758329310000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746233"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,19]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3746233"],"URL":"https:\/\/doi.org\/10.1145\/3746233","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,19]]},"assertion":[{"value":"2024-12-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}