{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T09:36:48Z","timestamp":1768297008734,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":25,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100007144","name":"University of Houston","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007144","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3473522","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:39:57Z","timestamp":1633459197000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Recursion Brings Speedup to Out-of-Core TensorCore-based Linear Algebra Algorithms: A Case Study of Classic Gram-Schmidt QR Factorization"],"prefix":"10.1145","author":[{"given":"Shaoshuai","family":"Zhang","sequence":"first","affiliation":[{"name":"University of Houston, United States of America"}]},{"given":"Panruo","family":"Wu","sequence":"additional","affiliation":[{"name":"University of Houston, United States of America"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"International Workshop on Applied Parallel Computing. Springer, 38\u201351","author":"Andersen S","year":"2000","unstructured":"Bjarne\u00a0 S Andersen , Fred Gustavson , Alexander Karaivanov , Minka Marinova , Jerzy Wa\u015bniewski , and Plamen Yalamov . 2000 . LAWRA linear algebra with recursive algorithms . In International Workshop on Applied Parallel Computing. Springer, 38\u201351 . Bjarne\u00a0S Andersen, Fred Gustavson, Alexander Karaivanov, Minka Marinova, Jerzy Wa\u015bniewski, and Plamen Yalamov. 2000. LAWRA linear algebra with recursive algorithms. In International Workshop on Applied Parallel Computing. Springer, 38\u201351."},{"key":"e_1_3_2_1_2_1","volume-title":"LAPACK Users","author":"Anderson Edward","unstructured":"Edward Anderson , Zhaojun Bai , Christian Bischof , L\u00a0Susan Blackford , James Demmel , Jack Dongarra , Jeremy Du\u00a0Croz , Anne Greenbaum , Sven Hammarling , Alan McKenney , 1999. LAPACK Users \u2019 guide. SIAM. Edward Anderson, Zhaojun Bai, Christian Bischof, L\u00a0Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du\u00a0Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, 1999. LAPACK Users\u2019 guide. SIAM."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1137\/090769156"},{"key":"e_1_3_2_1_4_1","volume-title":"ScaLAPACK users","author":"Blackford L\u00a0Susan","unstructured":"L\u00a0Susan Blackford , Jaeyoung Choi , Andy Cleary , Eduardo D\u2019Azevedo , James Demmel , Inderjit Dhillon , Jack Dongarra , Sven Hammarling , Greg Henry , Antoine Petitet , 1997. ScaLAPACK users \u2019 guide. SIAM. L\u00a0Susan Blackford, Jaeyoung Choi, Andy Cleary, Eduardo D\u2019Azevedo, James Demmel, Inderjit Dhillon, Jack Dongarra, Sven Hammarling, Greg Henry, Antoine Petitet, 1997. ScaLAPACK users\u2019 guide. SIAM."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3331057"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/1096-9128(20001225)12:15<1481::AID-CPE540>3.0.CO;2-V"},{"key":"e_1_3_2_1_7_1","volume-title":"Numerical computations with GPUs","author":"Dongarra Jack","unstructured":"Jack Dongarra , Mark Gates , Azzam Haidar , Jakub Kurzak , Piotr Luszczek , Stanimire Tomov , and Ichitaro Yamazaki . 2014. Accelerating numerical dense linear algebra calculations with GPUs . In Numerical computations with GPUs . Springer , 3\u201328. Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. 2014. Accelerating numerical dense linear algebra calculations with GPUs. In Numerical computations with GPUs. Springer, 3\u201328."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2012.6408679"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.444.0605"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93698-7_45"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00050"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3148226.3148237"},{"key":"e_1_3_2_1_13_1","unstructured":"Zhe Jia Marco Maggioni Benjamin Staiger and Daniele\u00a0P Scarpazza. 2018. Dissecting the nvidia volta gpu architecture via microbenchmarking. arXiv preprint arXiv:1804.06826(2018).  Zhe Jia Marco Maggioni Benjamin Staiger and Daniele\u00a0P Scarpazza. 2018. Dissecting the nvidia volta gpu architecture via microbenchmarking. arXiv preprint arXiv:1804.06826(2018)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-58667-0_9"},{"key":"e_1_3_2_1_15_1","volume-title":"Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD. Concurrency and Computation: Practice and Experience","author":"Lu Yuechao","year":"2020","unstructured":"Yuechao Lu , Ichitaro Yamazaki , Fumihiko Ino , Yasuyuki Matsushita , Stanimire Tomov , and Jack Dongarra . 2020. Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD. Concurrency and Computation: Practice and Experience ( 2020 ), e5754. Yuechao Lu, Ichitaro Yamazaki, Fumihiko Ino, Yasuyuki Matsushita, Stanimire Tomov, and Jack Dongarra. 2020. Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD. Concurrency and Computation: Practice and Experience (2020), e5754."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2018.00091"},{"key":"e_1_3_2_1_17_1","unstructured":"Tesla NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture.  Tesla NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture."},{"key":"e_1_3_2_1_18_1","unstructured":"Elmar Peise and Paolo Bientinesi. 2016. Recursive algorithms for dense linear algebra: The relapack collection. arXiv preprint arXiv:1602.06763(2016).  Elmar Peise and Paolo Bientinesi. 2016. Recursive algorithms for dense linear algebra: The relapack collection. arXiv preprint arXiv:1602.06763(2016)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2331130.2331133"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479896297744"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/236017.236029"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcph.2002.7090"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926256"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3369583.3392685"},{"key":"e_1_3_2_1_25_1","volume-title":"Basic Linear Algebra Operations on TensorCore GPU. In 2020 IEEE\/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). IEEE, 44\u201352","author":"Zhang Shaoshuai","year":"2020","unstructured":"Shaoshuai Zhang , Vivek Karihaloo , and Panruo Wu . 2020 . Basic Linear Algebra Operations on TensorCore GPU. In 2020 IEEE\/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). IEEE, 44\u201352 . Shaoshuai Zhang, Vivek Karihaloo, and Panruo Wu. 2020. Basic Linear Algebra Operations on TensorCore GPU. In 2020 IEEE\/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). IEEE, 44\u201352."}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","location":"Lemont IL USA","acronym":"ICPP 2021"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473522","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3473522","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:23Z","timestamp":1750191443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473522"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":25,"alternative-id":["10.1145\/3472456.3473522","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3473522","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}