{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T08:40:05Z","timestamp":1766220005156,"version":"3.48.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","funder":[{"name":"The National Key Research and Development Program of China","award":["2023YFA1011704"],"award-info":[{"award-number":["2023YFA1011704"]}]},{"name":"The National Key Research and Development Program of China","award":["2021YFB0300101"],"award-info":[{"award-number":["2021YFB0300101"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,9,8]]},"DOI":"10.1145\/3754598.3754624","type":"proceedings-article","created":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T08:34:32Z","timestamp":1766219672000},"page":"553-563","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Optimizing Incomplete Cholesky Factorization on MIMD Many-core Architecture"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-0256-325X","authenticated-orcid":false,"given":"Yongzhen","family":"Shi","sequence":"first","affiliation":[{"name":"Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha, China and College of Computer Science and Technology, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8286-6566","authenticated-orcid":false,"given":"Qinglin","family":"Wang","sequence":"additional","affiliation":[{"name":"Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha, China and College of Computer Science and Technology, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3745-7541","authenticated-orcid":false,"given":"Jie","family":"Liu","sequence":"additional","affiliation":[{"name":"Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, China; National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha, China and College of Computer Science and Technology, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-4548-6600","authenticated-orcid":false,"given":"Lian","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanxi Supercomputing Center, Lvliang, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5870-0145","authenticated-orcid":false,"given":"Zhiyan","family":"Liu","sequence":"additional","affiliation":[{"name":"Shanxi Supercomputing Center, Lvliang, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6434-6235","authenticated-orcid":false,"given":"Bingwei","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanxi Supercomputing Center, Lvliang, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8775-4836","authenticated-orcid":false,"given":"Feiming","family":"Liu","sequence":"additional","affiliation":[{"name":"Shanxi Supercomputing Center, Lvliang, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5454-9482","authenticated-orcid":false,"given":"Xiangdong","family":"Pei","sequence":"additional","affiliation":[{"name":"Shanxi Supercomputing Center, Lvliang, China"}]}],"member":"320","published-online":{"date-parts":[[2025,12,20]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Hartwig Anzt Terry Cojean Goran Flegar Fritz G\u00f6bel Thomas Gr\u00fctzmacher Pratik Nayak Tobias Ribizel Yuhsiang\u00a0Mike Tsai and Enrique\u00a0S. Quintana-Ort\u00ed. 2022. Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. ACM Trans. Math. Software 48 1 (Feb. 2022) 2:1\u20132:33.","DOI":"10.1145\/3480935"},{"key":"e_1_3_3_1_3_2","doi-asserted-by":"crossref","unstructured":"Tatsumi Aoyama K-I. Ishikawa Yasuyuki Kimura Hideo Matsufuru Atsushi Sato Tomohiro Suzuki and Sunao Torii. 2016. First Application of Lattice QCD to Pezy-SC Processor. Procedia Computer Science 80 (2016) 1418\u20131427.","DOI":"10.1016\/j.procs.2016.05.457"},{"key":"e_1_3_3_1_4_2","unstructured":"Satish Balay Shrirang Abhyankar Mark\u00a0F. Adams Steven Benson Jed Brown Peter Brune Kris Buschelman Emil\u00a0M. Constantinescu Lisandro Dalcin Alp Dener Victor Eijkhout Jacob Faibussowitsch William\u00a0D. Gropp V\u00e1clav Hapla Tobin Isaac Pierre Jolivet Dmitry Karpeev Dinesh Kaushik Matthew\u00a0G. Knepley Fande Kong Scott Kruger Dave\u00a0A. May Lois\u00a0Curfman McInnes Richard\u00a0Tran Mills Lawrence Mitchell Todd Munson Jose\u00a0E. Roman Karl Rupp Patrick Sanan Jason Sarich Barry\u00a0F. Smith Stefano Zampini Hong Zhang Hong Zhang and Junchao Zhang. 2025. PETSc Web page. https:\/\/petsc.org\/"},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"crossref","unstructured":"N Bitoulas and M Papadrakakis. 1994. An optimized computer implementation of incomplete Cholesky factorization. Computing Systems in Engineering 5 3 (1994) 265\u2013274.","DOI":"10.1016\/0956-0521(94)90005-1"},{"key":"e_1_3_3_1_6_2","first-page":"917","volume-title":"Encyclopedia of parallel computing","author":"Bollh\u00f6fer Matthias","year":"2011","unstructured":"Matthias Bollh\u00f6fer, Jos\u00e9\u00a0I Aliaga, Alberto\u00a0F Mart\u0131n, and Enrique\u00a0S Quintana-Ort\u00ed. 2011. ILUPACK. In Encyclopedia of parallel computing. 917\u2013926."},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Matthias Bollh\u00f6fer and Yousef Saad. 2006. Multilevel Preconditioners Constructed From Inverse-Based ILUs. SIAM Journal on Scientific Computing 27 5 (2006) 1627\u20131650.","DOI":"10.1137\/040608374"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Li Chen Shuisheng Zhou Jiajun Ma and Mingliang Xu. 2021. Fast kernel [formula omitted]-means clustering using incomplete Cholesky factorization. Appl. Math. Comput. 402 (2021) 126037.","DOI":"10.1016\/j.amc.2021.126037"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"crossref","unstructured":"Edmond Chow and Aftab Patel. 2015. Fine-Grained Parallel Incomplete LU Factorization. SIAM journal on Scientific Computing 37 2 (2015) C169\u2013C193.","DOI":"10.1137\/140968896"},{"key":"e_1_3_3_1_10_2","volume-title":"Intel Xeon 4314 CPU","author":"Corporation Intel","year":"2021","unstructured":"Intel Corporation. 2021. Intel Xeon 4314 CPU. https:\/\/www.intel.com\/content\/www\/us\/en\/products\/sku\/215269\/intel-xeon-silver-4314-processor-24m-cache-2-40-ghz\/specifications.html"},{"key":"e_1_3_3_1_11_2","unstructured":"NVIDIA Corporation. 2022. Nvidia A30 Tensor Core GPU. https:\/\/www.nvidia.com\/en-us\/data-center\/products\/a30-gpu\/"},{"key":"e_1_3_3_1_12_2","unstructured":"NVIDIA Corporation. 2023. Nvidia CUSPARSE. https:\/\/docs.nvidia.com\/cuda\/cusparse\/"},{"key":"e_1_3_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Timothy\u00a0A. Davis and Yifan Hu. 2011. The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38 1 Article 1 (Dec. 2011) 25\u00a0pages.","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1109\/HPCA.2011.5749714","volume-title":"2011 IEEE 17th international symposium on high performance computer architecture","author":"Fung Wilson\u00a0WL","year":"2011","unstructured":"Wilson\u00a0WL Fung and Tor\u00a0M Aamodt. 2011. Thread block compaction for efficient SIMT control flow. In 2011 IEEE 17th international symposium on high performance computer architecture. 25\u201336."},{"key":"e_1_3_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Gene\u00a0H Golub and Charles\u00a0F Van\u00a0Loan. 2013. Matrix computations. JHU press.","DOI":"10.56021\/9781421407944"},{"key":"e_1_3_3_1_16_2","first-page":"22","volume-title":"International Conference on Algorithms and Architectures for Parallel Processing","author":"Guo Jihu","year":"2023","unstructured":"Jihu Guo, Jie Liu, Qinglin Wang, and Xiaoxiong Zhu. 2023. Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s. In International Conference on Algorithms and Architectures for Parallel Processing. 22\u201339."},{"key":"e_1_3_3_1_17_2","unstructured":"Naoya Hatta Shuntaro Tsunoda Kouhei Uchida Taichi Ishitani Ryota Shioya and Kei Ishii. 2022. PEZY-SC3: A MIMD Many-core Processor for Energy-efficient Computing. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2301.07510 (2022)."},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"crossref","unstructured":"Pascal H\u00e9non Pierre Ramet and Jean Roman. 2008. On finding approximate supernodes for an efficient block-ILU(k) factorization. Parallel Comput. 34 6 (2008) 345\u2013362. Parallel Matrix Algorithms and Applications.","DOI":"10.1016\/j.parco.2007.12.003"},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"crossref","unstructured":"James Hook Jennifer Scott Francoise Tisseur and Jonathan Hogg. 2018. A Max-Plus Approach to Incomplete Cholesky Factorization Preconditioners. SIAM Journal on Scientific Computing 40 4 (2018) A1987\u2013A2004.","DOI":"10.1137\/16M1107735"},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Takeshi Iwashita Naokazu Takemura Akihiro Ida and Hiroshi Nakashima. 2015. A New Fill-in Strategy for IC Factorization Preconditioning Considering SIMD Instructions. 2015 IEEE Trustcom\/BigDataSE\/ISPA 3 (2015) 37\u201344.","DOI":"10.1109\/Trustcom.2015.610"},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"crossref","unstructured":"Mark\u00a0T Jones and Paul\u00a0E Plassmann. 1993. A Parallel Graph Coloring Heuristic. SIAM Journal on Scientific Computing 14 3 (1993) 654\u2013669.","DOI":"10.1137\/0914041"},{"key":"e_1_3_3_1_22_2","doi-asserted-by":"crossref","unstructured":"Mark\u00a0T Jones and Paul\u00a0E Plassmann. 1995. An improved incomplete Cholesky factorization. ACM Transactions on Mathematical Software (TOMS) 21 1 (1995) 5\u201317.","DOI":"10.1145\/200979.200981"},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"crossref","unstructured":"George Karypis and Vipin Kumar. 1996. Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs. (1996) 35\u2013es.","DOI":"10.1145\/369028.369103"},{"key":"e_1_3_3_1_24_2","unstructured":"Kyungjoo Kim Sivasankaran Rajamanickam George Stelle H\u00a0Carter Edwards and Stephen\u00a0L Olivier. 2016. Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1601.05871 (2016)."},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Liang Li Tingzhu Huang Yan-Fei Jing and Zhi\u2010Gang Ren. 2015. Effective preconditioning through minimum degree ordering interleaved with incomplete factorization. Journal of computational and applied mathematics 279 (2015) 225\u2013232.","DOI":"10.1016\/j.cam.2014.11.010"},{"key":"e_1_3_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Ruipeng Li and Yousef Saad. 2013. GPU-accelerated preconditioned iterative linear solvers. The Journal of Supercomputing 63 (2013) 443\u2013466.","DOI":"10.1007\/s11227-012-0825-3"},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Chih-Jen Lin and Jorge\u00a0J. Mor\u00e9. 1999. Incomplete Cholesky Factorizations with Limited Memory. SIAM J. Sci. Comput. 21 (1999) 24\u201345.","DOI":"10.1137\/S1064827597327334"},{"key":"e_1_3_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Hao Lu Mahantesh\u00a0M. Halappanavar Daniel\u00a0G. Chavarr\u00eda-Miranda Assefaw\u00a0Hadish Gebremedhin Ajay Panyala and A. Kalyanaraman. 2017. Algorithms for Balanced Graph Colorings with Applications in Parallel Computing. IEEE Transactions on Parallel and Distributed Systems 28 (2017) 1240\u20131256.","DOI":"10.1109\/TPDS.2016.2620142"},{"key":"e_1_3_3_1_29_2","doi-asserted-by":"crossref","unstructured":"Kazuya Matsumoto Naohito Nakasato and Toshiaki Hishinuma. 2019. Effectiveness of performance tuning techniques for general matrix multiplication on the PEZY-SC2. Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (2019) 1\u20136.","DOI":"10.1145\/3337801.3337817"},{"key":"e_1_3_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Artem Napov. 2023. An Incomplete Cholesky Preconditioner Based on Orthogonal Approximations. SIAM Journal on Scientific Computing 45 2 (2023) A729\u2013A752.","DOI":"10.1137\/21M1468334"},{"key":"e_1_3_3_1_31_2","volume-title":"Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU","author":"Naumov Maxim","year":"2012","unstructured":"Maxim Naumov. 2012. Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU. Technical Report."},{"key":"e_1_3_3_1_32_2","volume-title":"Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU","author":"Naumov Maxim","year":"2015","unstructured":"Maxim Naumov, Patrice Castonguay, and Jonathan Cohen. 2015. Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU. Technical Report."},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198515760.001.0001","volume-title":"Introduction to Parallel Computing: A practical guide with examples in C","author":"Petersen Wesley","year":"2004","unstructured":"Wesley Petersen and Peter Arbenz. 2004. Introduction to Parallel Computing: A practical guide with examples in C. New York: Oxford University Press."},{"key":"e_1_3_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM.","DOI":"10.1137\/1.9780898718003"},{"key":"e_1_3_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Andr\u00e9\u00a0Kubagawa Sato Thiago\u00a0Castro Martins and Marcos Sales\u00a0Guerra Tsuzuki. 2023. GPU implementation of an incomplete Cholesky conjugate gradient solver for a FEM-generated system using full kernel consolidation. Soft Computing 27 14 (2023) 9307\u20139320.","DOI":"10.1007\/s00500-023-08125-9"},{"key":"e_1_3_3_1_36_2","doi-asserted-by":"crossref","unstructured":"Jennifer Scott and Miroslav T\u016fma. 2014. HSL_MI28: An Efficient and Robust Limited-Memory Incomplete Cholesky Factorization Code. ACM Trans. Math. Softw. 40 4 Article 24 (July 2014) 19\u00a0pages.","DOI":"10.1145\/2617555"},{"key":"e_1_3_3_1_37_2","volume-title":"The 22nd Green500 List - November 2021","year":"2021","unstructured":"TOP500. 2021. The 22nd Green500 List - November 2021. https:\/\/top500.org\/lists\/green500\/list\/2021\/11"},{"key":"e_1_3_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Takumi Washio Xiaoke Cui Ryo Kanada Jun-ichi Okada Seiryo Sugiura Yasushi Okuno Shoji Takada and Toshiaki Hisada. 2022. Using incomplete Cholesky factorization to increase the time step in molecular dynamics simulations. J. Comput. Appl. Math. 415 (2022) 114519.","DOI":"10.1016\/j.cam.2022.114519"},{"key":"e_1_3_3_1_39_2","doi-asserted-by":"crossref","unstructured":"Yuejin Ye Heng Guo Bingzhuo Wang Pengxiao Wang Dexun Chen and Fang Li. 2023. Coupled Incomplete Cholesky and Jacobi Preconditioned Conjugate Gradient on the New Generation of Sunway Many-Core Architecture. IEEE Trans. Comput. 72 11 (2023) 3326\u20133339.","DOI":"10.1109\/TC.2023.3296884"},{"key":"e_1_3_3_1_40_2","doi-asserted-by":"crossref","unstructured":"Fan Yuan Xiaojian Yang Shengguo Li Dezun Dong Chun Huang and Zheng Wang. 2024. Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores. IEEE Transactions on Parallel and Distributed Systems 35 (2024) 768\u2013779.","DOI":"10.1109\/TPDS.2024.3372473"}],"event":{"name":"ICPP '25: 54th International Conference on Parallel Processing","location":"San Diego CA USA","acronym":"ICPP '25"},"container-title":["Proceedings of the 54th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3754598.3754624","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T08:38:54Z","timestamp":1766219934000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3754598.3754624"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,8]]},"references-count":39,"alternative-id":["10.1145\/3754598.3754624","10.1145\/3754598"],"URL":"https:\/\/doi.org\/10.1145\/3754598.3754624","relation":{},"subject":[],"published":{"date-parts":[[2025,9,8]]},"assertion":[{"value":"2025-12-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}