{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:09:40Z","timestamp":1750219780710,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,3,29]],"date-time":"2023-03-29T00:00:00Z","timestamp":1680048000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>\n            In this article, we present a CUDA library with a C API for solving block cyclic tridiagonal and banded systems on one GPU. The library can process block tridiagonal systems with block sizes from 1 \u00d7 1 (scalar) to 4 \u00d7 4 and banded systems with up to four sub- and superdiagonals. For the compute-intensive block size cases and cases with many right-hand sides, we write out an explicit factorization to memory; however, for the scalar case, the fastest approach is to only output the coarse system and recompute the factorization. Prominent features of the library are (scaled) partial pivoting for improved numeric stability; highest-performance kernels, which completely utilize GPU memory bandwidth; and support for multiple sparse or dense right-hand side and solution vectors. The additional memory consumption is only 5% of the original tridiagonal system, which enables the solution of systems up to GPU memory size. The performance of the state-of-the-art scalar tridiagonal solver of cuSPARSE is outperformed by factor 5 for large problem sizes of 2\n            <jats:sup>25<\/jats:sup>\n            unknowns, on a GeForce RTX 2080 Ti.\n          <\/jats:p>","DOI":"10.1145\/3580373","type":"journal-article","created":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T12:06:11Z","timestamp":1675166771000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Tridigpu: A GPU Library for Block Tridiagonal and Banded Linear Equation Systems"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2019-6074","authenticated-orcid":false,"given":"Christoph","family":"Klein","sequence":"first","affiliation":[{"name":"Institute of Computer Engineering (ZITI), Heidelberg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0468-0472","authenticated-orcid":false,"given":"Robert","family":"Strzodka","sequence":"additional","affiliation":[{"name":"Institute of Computer Engineering (ZITI), Heidelberg, Germany"}]}],"member":"320","published-online":{"date-parts":[[2023,3,29]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2133803.2345676"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/SCCC.2011.29"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.5555\/323215"},{"key":"e_1_3_3_5_2","volume-title":"Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures","author":"Chang Li-Wen","year":"2014","unstructured":"Li-Wen Chang. 2014. Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures. Master\u2019s Thesis. University of Illinois at Urbana-Champaign."},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.12"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1137\/080740167"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.92"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2015.17"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-018-2676-z"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3328731"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2017.2723879"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/0010-4655(82)90016-9"},{"issue":"4","key":"e_1_3_3_15_2","first-page":"1","article-title":"Generalized diagonal pivoting methods for tridiagonal systems without interchanges","volume":"40","author":"Erway Jennifer B.","year":"2010","unstructured":"Jennifer B. Erway, Roummel F. Marcia, and Joseph A. Tyson. 2010. Generalized diagonal pivoting methods for tridiagonal systems without interchanges. IAENG International Journal of Applied Mathematics 40, 4 (2010), 1\u20137.","journal-title":"IAENG International Journal of Applied Mathematics"},{"issue":"2","key":"e_1_3_3_16_2","first-page":"303","article-title":"ADI finite difference schemes for option pricing","volume":"7","author":"Foulon S.","year":"2010","unstructured":"S. Foulon and K. J. in\u2019t Hout. 2010. ADI finite difference schemes for option pricing. International Journal of Numerical Analysis and Modeling 7, 2 (2010), 303\u2013320.","journal-title":"International Journal of Numerical Analysis and Modeling"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/bf01321860"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1137\/20M1311053"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/WHPCF.2014.10"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2019.03.016"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.61"},{"key":"e_1_3_3_22_2","unstructured":"Ga\u00ebl Guennebaud and Beno\u00eet Jacob.2012. Eigen v3. Retrieved February 9 2023 from http:\/\/eigen.tuxfamily.org."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ocemod.2003.10.002"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/321250.321259"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.matcom.2021.07.019"},{"issue":"2006","key":"e_1_3_3_26_2","first-page":"1","article-title":"Interactive depth of field using simulated diffusion on a GPU","author":"Kass Michael","year":"2006","unstructured":"Michael Kass, Aaron Lefohn, and John Owens. 2006. Interactive depth of field using simulated diffusion on a GPU. Computing2006 (2006), 1\u20138.","journal-title":"Computing"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/97879.97884"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2011.41"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2020.107722"},{"key":"e_1_3_3_30_2","unstructured":"Christoph Klein and Robert Strzodka. tridigpu. n.d. Retrieved February 9 2023 from https:\/\/mp-force.ziti.uni-heidelberg.de\/asc\/code\/tridigpu."},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472456.3472484"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830568"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2017.07.014"},{"key":"e_1_3_3_34_2","unstructured":"Duane Merrill. 2021. CUB. Retrieved February 9 2023 from https:\/\/nvlabs.github.io\/cub."},{"key":"e_1_3_3_35_2","unstructured":"NVIDIA. 2022. cuSPARSE Library. Retrieved February 9 2023 from https:\/\/docs.nvidia.com\/cuda\/cusparse."},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/PDP2018.2018.00123"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/0168-9274(86)90002-4"},{"key":"e_1_3_3_38_2","unstructured":"L. Thomas. 1949. Elliptic Problems in Linear Differential Equations over a Network . Watson Scientific Computing Laboratory Report Columbia University New York NY."},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2015.03.008"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/355945.355947"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/SPDP.1991.218237"},{"key":"e_1_3_3_42_2","unstructured":"OSGi Alliance. 2010. Semantic Versioning . Technical Whitepaper. OSGi Alliance."}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580373","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580373","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:42Z","timestamp":1750178262000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580373"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,29]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3580373"],"URL":"https:\/\/doi.org\/10.1145\/3580373","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2023,3,29]]},"assertion":[{"value":"2021-12-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-12","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}