{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:18:32Z","timestamp":1763468312875,"version":"3.38.0"},"reference-count":39,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2016,5,5]],"date-time":"2016-05-05T00:00:00Z","timestamp":1462406400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2018,3]]},"abstract":"<jats:p> In this paper, we present an optimized GPU implementation for the induced dimension reduction algorithm. We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent kernel execution. A comprehensive performance evaluation is conducted using a suitable performance model. The analysis reveals efficiency of up to 90%, which indicates that the implementation achieves performance close to the theoretically attainable bound. <\/jats:p>","DOI":"10.1177\/1094342016646844","type":"journal-article","created":{"date-parts":[[2016,5,7]],"date-time":"2016-05-07T00:23:40Z","timestamp":1462580620000},"page":"220-230","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs"],"prefix":"10.1177","volume":"32","author":[{"given":"Hartwig","family":"Anzt","sequence":"first","affiliation":[{"name":"University of Tennessee, Knoxville, USA"}]},{"given":"Moritz","family":"Kreutzer","sequence":"additional","affiliation":[{"name":"University of Erlangen-Nuremberg, Germany"}]},{"given":"Eduardo","family":"Ponce","sequence":"additional","affiliation":[{"name":"University of Tennessee, Knoxville, USA"}]},{"given":"Gregory D","family":"Peterson","sequence":"additional","affiliation":[{"name":"University of Tennessee, Knoxville, USA"}]},{"given":"Gerhard","family":"Wellein","sequence":"additional","affiliation":[{"name":"University of Erlangen-Nuremberg, Germany"}]},{"given":"Jack","family":"Dongarra","sequence":"additional","affiliation":[{"name":"University of Tennessee, Knoxville, USA"},{"name":"Oak Ridge National Laboratory, USA"},{"name":"University of Manchester, UK"}]}],"member":"179","published-online":{"date-parts":[[2016,5,5]]},"reference":[{"volume-title":"CUDA Toolkit v7.5","year":"2015","key":"bibr1-1094342016646844"},{"volume-title":"cuSPARSE Toolkit v7.0","year":"2015","key":"bibr2-1094342016646844"},{"key":"bibr3-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.41"},{"key":"bibr4-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-48096-0_52"},{"key":"bibr5-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1145\/2834899.2834907"},{"key":"bibr6-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015580139"},{"key":"bibr7-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"bibr8-1094342016646844","unstructured":"Bergman K, Borkar S, Campbell D, (2008) ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems Peter Kogge, Editor & Study Lead. DARPA\/IPTO Program."},{"key":"bibr9-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567807"},{"key":"bibr10-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1002\/nla.764"},{"key":"bibr11-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-14325-5_2"},{"key":"bibr12-1094342016646844","doi-asserted-by":"publisher","DOI":"10.2528\/PIER11031607"},{"journal-title":"CoRR","year":"2013","author":"Filipovic J","key":"bibr13-1094342016646844"},{"volume-title":"4th USENIX workshop on hot topics in parallelism","year":"2012","author":"Gregg C","key":"bibr14-1094342016646844"},{"key":"bibr15-1094342016646844","doi-asserted-by":"publisher","DOI":"10.6028\/jres.049.044"},{"key":"bibr16-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2015.7054182"},{"key":"bibr17-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1016\/j.cam.2011.07.021"},{"key":"bibr18-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1137\/130930352"},{"journal-title":"CoRR","year":"2015","author":"Kreutzer M","key":"bibr19-1094342016646844"},{"key":"bibr20-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-012-0825-3"},{"key":"bibr21-1094342016646844","first-page":"1","volume-title":"HPC 2012: Proceedings of the 2012 symposium on high performance computing","author":"Lukash M","year":"2012"},{"key":"bibr22-1094342016646844","unstructured":"MAGMA (2015) MAGMA 1.6.2. Available at: http:\/\/icl.cs.utk.edu\/magma\/ (accessed November 2015)."},{"key":"bibr23-1094342016646844","unstructured":"Lukarski D, Trost N (2015) PARALUTION. Available at: http:\/\/www.paralution.com\/PARALUTION (accessed November 2015)."},{"key":"bibr24-1094342016646844","first-page":"19","author":"McCalpin JD","year":"1995","journal-title":"IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter"},{"key":"bibr25-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11515-8_10"},{"key":"bibr26-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1016\/j.laa.2012.11.021"},{"key":"bibr27-1094342016646844","unstructured":"Rupp K (2015) ViennaCL. Available at: http:\/\/viennacl.sourceforge.net\/ (accessed November 2015)."},{"key":"bibr28-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"bibr29-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1137\/090774756"},{"key":"bibr30-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1137\/070685804"},{"key":"bibr31-1094342016646844","unstructured":"Strohmaier E, Dongarra J, Simon H, Meuer M (2015) The TOP500 list. Available at: http:\/\/www.top.org\/ (accessed November 2015)."},{"key":"bibr32-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-014-1102-4"},{"key":"bibr33-1094342016646844","unstructured":"van Gijzen MB (2015) The induced dimension reduction method. Available at: http:\/\/ta.twi.tudelft.nl\/nw\/users\/gijzen\/IDR.html (accessed November 2015)."},{"key":"bibr34-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1002\/nla.1935"},{"key":"bibr35-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049667"},{"key":"bibr36-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1658"},{"key":"bibr37-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1109\/GreenCom-CPSCom.2010.102"},{"key":"bibr38-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2011.5999803"},{"key":"bibr39-1094342016646844","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016646844","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342016646844","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016646844","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T19:17:45Z","timestamp":1740770265000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342016646844"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,5,5]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2018,3]]}},"alternative-id":["10.1177\/1094342016646844"],"URL":"https:\/\/doi.org\/10.1177\/1094342016646844","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2016,5,5]]}}}