{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T09:08:03Z","timestamp":1768208883219,"version":"3.49.0"},"reference-count":40,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T00:00:00Z","timestamp":1758585600000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100000083","name":"Directorate for Computer and Information Science and Engineering","doi-asserted-by":"publisher","award":["2004541"],"award-info":[{"award-number":["2004541"]}],"id":[{"id":"10.13039\/100000083","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006132","name":"Office of Science","doi-asserted-by":"publisher","award":["17-SC-20-SC"],"award-info":[{"award-number":["17-SC-20-SC"]}],"id":[{"id":"10.13039\/100006132","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006393","name":"Air Force Civil Engineer Center","doi-asserted-by":"publisher","award":["FA8750-19-2-1000"],"award-info":[{"award-number":["FA8750-19-2-1000"]}],"id":[{"id":"10.13039\/100006393","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2024,11]]},"abstract":"<jats:p> Performing a variety of numerical computations efficiently and, at the same time, in a portable fashion requires both an overarching design followed by a number of implementation strategies. All of these are exemplified below as we present transitioning the PLASMA numerical library from relying on dependence-driven large tasks to achieving utilization of fine grain tasking and offload to hardware accelerators while keeping its core dependence sets: OpenMP source code pragmas and runtime for most system-level functionality and basic low-level numerical kernels provided directly by hardware vendors or open source projects with vendor contributions. We also present new algorithmic methods and their efficient parallel implementations including fine grained tasking for eigen-spectrum slicing and offload for mixed-precision eigenvalue refinement. We provide performance, scaling, and numerical results showing sizable gains over the available solutions from either the open source and vendor-provided packages. <\/jats:p>","DOI":"10.1177\/10943420241281050","type":"journal-article","created":{"date-parts":[[2024,9,23]],"date-time":"2024-09-23T21:00:10Z","timestamp":1727125210000},"page":"671-691","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload"],"prefix":"10.1177","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0089-6965","authenticated-orcid":false,"given":"Piotr","family":"Luszczek","sequence":"first","affiliation":[{"name":"MIT Lincoln Lab, LLS, CMIT Lincoln Laboratory, Lexington, MA,USA"},{"name":"Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA"}]},{"given":"Anthony","family":"Castaldo","sequence":"additional","affiliation":[{"name":"Technical Development, Synopsys, Inc., Sunnyvale, CA, USA"}]},{"given":"Yaohung M","family":"Tsai","sequence":"additional","affiliation":[{"name":"ML Compilers and AI Accelerators, Meta, Inc., Menlo Park, CA,USA"}]},{"given":"Daniel","family":"Mishler","sequence":"additional","affiliation":[{"name":"Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA"}]},{"given":"Jack","family":"Dongarra","sequence":"additional","affiliation":[{"name":"Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA"},{"name":"Computer Science and Mathematics, Oak Ridge National Laboratory, Oak Ridge, TN,USA"},{"name":"Applied Mathematics, University of Manchester, Manchester,UK"}]}],"member":"179","published-online":{"date-parts":[[2024,9,23]]},"reference":[{"key":"bibr40-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-385963-1.00034-4"},{"key":"bibr2-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719604"},{"key":"bibr3-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1103\/RevModPhys.35.690"},{"key":"bibr4-10943420241281050","unstructured":"Architecture Review Board (2021) Openmp application programming interface. Version 5.2, November."},{"key":"bibr5-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/aa819a"},{"key":"bibr6-10943420241281050","doi-asserted-by":"crossref","unstructured":"Bosilca G, Bouteiller A, Danalis A, et al. (2011) Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: 12th IEEE international workshop on parallel and distributed scientific and engineering computing (PDSEC\u201911), Anchorage, Alaska, 20 May 2011.","DOI":"10.1109\/IPDPS.2011.299"},{"key":"bibr7-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1301"},{"key":"bibr8-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRev.121.659"},{"key":"bibr9-10943420241281050","volume-title":"A Newton-Schulz Variant for Improving the Initial Convergence in Matrix Sign Computation","author":"Chen J","year":"2014"},{"issue":"136","key":"bibr10-10943420241281050","first-page":"772","volume":"30","author":"Daniel JW","year":"1976","journal-title":"Mathematics of Computation"},{"key":"bibr11-10943420241281050","first-page":"116","volume":"3","author":"Demmel JW","year":"1995","journal-title":"Electronic Transactions on Numerical Analysis"},{"key":"bibr12-10943420241281050","volume-title":"A Testing Infrastructure for LAPACK\u2019s Symmetric Eigensolvers","author":"Demmel JW","year":"2006"},{"key":"bibr13-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1137\/080731992"},{"key":"bibr14-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1137\/13092157X"},{"key":"bibr15-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1145\/3264491"},{"key":"bibr16-10943420241281050","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.330"},{"key":"bibr17-10943420241281050","doi-asserted-by":"crossref","unstructured":"Gates M, Kurzak J, Charara A, et al. (2019) SLATE: design of a modern distributed and accelerated linear algebra library. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, Denver, CO, 12\u201317 November 2023, pp. 1\u201318.","DOI":"10.1145\/3295500.3356223"},{"key":"bibr18-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2005.08.009"},{"key":"bibr19-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-005-0615-4"},{"key":"bibr20-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479892241287"},{"key":"bibr21-10943420241281050","volume-title":"Efficient Computation of the Singular Value Decomposition with Applications to Least Squares Problems","author":"Gu M","year":"1994"},{"key":"bibr22-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063394"},{"key":"bibr23-10943420241281050","doi-asserted-by":"crossref","unstructured":"Haidar A, Ltaief H, Luszczek P, et al. (2012) A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In: Proceedings of the IEEE international parallel and distributed processing symposium, Shanghai, China, 21\u201325 May 2012.","DOI":"10.1109\/IPDPS.2012.13"},{"key":"bibr24-10943420241281050","doi-asserted-by":"crossref","unstructured":"Haidar A, Kurzak J, Luszczek P (2013) An improved parallel singular value algorithm and its implementation for multicore hardware. In: SC13, the international conference for high performance computing, networking, storage and analysis. Denver, Colorado, 17\u201321 November 2013.","DOI":"10.1145\/2503210.2503292"},{"key":"bibr25-10943420241281050","doi-asserted-by":"crossref","unstructured":"Haidar A, Luszczek P, Dongarra J (2014) New algorithm for computing eigenvectors of the symmetric eigenvalue problem. In: The 15th IEEE international workshop on parallel and distributed scientific and engineering computing (PDSEC 2014). Phoenix, AZ, 23 May 2014.","DOI":"10.1109\/IPDPSW.2014.130"},{"key":"bibr26-10943420241281050","unstructured":"ISO\/IEC14882 (2023) Information technology \u2013 programming languages \u2013 C++."},{"key":"bibr27-10943420241281050","unstructured":"ISO\/IEC1539-1 (2018) Information technology \u2013 programming languages \u2013 Fortran."},{"key":"bibr28-10943420241281050","unstructured":"ISO\/IEC9899 (2023) Information technology \u2013 programming languages \u2013 C."},{"key":"bibr41-10943420241281050","doi-asserted-by":"publisher","DOI":"10.14708\/ma.v2i2.1048"},{"key":"bibr30-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRev.97.1474"},{"key":"bibr31-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31464-3_67"},{"key":"bibr32-10943420241281050","doi-asserted-by":"crossref","unstructured":"Luszczek P, Ltaief H, Dongarra J (2011) Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In: IPDPS 2011: IEEE international parallel and distributed processing symposium, Anchorage, Alaska, 16\u201320 May 2011.","DOI":"10.1109\/IPDPS.2011.91"},{"key":"bibr33-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1145\/1377603.1377611"},{"key":"bibr34-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1007\/s13160-018-0310-3"},{"key":"bibr35-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827500381239"},{"key":"bibr36-10943420241281050","doi-asserted-by":"crossref","unstructured":"Sid-Lakhdar W, Cayrols S, Bielich D, et al. (2023) PAQR: pivoting avoiding QR factorization. In: Proceedings of 36th IEEE international parallel & distributed processing symposium (IPDPS), Lyon, France, 30 May\u20133 June 2022. Best paper nominee.","DOI":"10.1109\/IPDPS54959.2023.00040"},{"key":"bibr37-10943420241281050","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2019.102571"},{"key":"bibr38-10943420241281050","doi-asserted-by":"crossref","unstructured":"Tsai Y, Luszczek P, Dongarra J (2022) Mixed-precision algorithm for finding selected eigenvalues and eigenvectors of symmetric and Hermitian matrices. In: ScalAH22: 13th workshop on latest advances in scalable algorithms for large-scale heterogeneous systems, Dallas, Texas, 13 November 2022, pp. 1\u201310.","DOI":"10.1109\/ScalAH56622.2022.00011"},{"key":"bibr39-10943420241281050","volume-title":"Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices","author":"Volkov V","year":"2007"},{"key":"bibr42-10943420241281050","author":"Zhang J","year":"2003","journal-title":"Society of Industrial and Applied Mathematics"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241281050","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420241281050","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241281050","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241281050","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T23:53:01Z","timestamp":1740873181000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420241281050"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,23]]},"references-count":40,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,11]]}},"alternative-id":["10.1177\/10943420241281050"],"URL":"https:\/\/doi.org\/10.1177\/10943420241281050","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,23]]}}}