{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T12:07:29Z","timestamp":1781006849692,"version":"3.54.1"},"reference-count":154,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T00:00:00Z","timestamp":1774310400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T00:00:00Z","timestamp":1774310400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"crossref","award":["DE-AC05-00OR22725"],"award-info":[{"award-number":["DE-AC05-00OR22725"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    The explosive demand for artificial intelligence (AI) workloads has led to a significant increase in silicon area dedicated to lower-precision computations on recent high-performance computing hardware designs. However, mixed-precision capabilities, which can achieve performance improvements of up to 8\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$\\times$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mo>\u00d7<\/mml:mo>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    compared to double-precision in extreme compute-intensive workloads, remain largely untapped in most scientific applications. A growing number of efforts have shown that mixed-precision algorithmic innovations can deliver superior performance without sacrificing accuracy. These developments should prompt computational scientists to seriously consider whether their scientific modeling and simulation applications could benefit from the acceleration offered by new hardware and mixed-precision algorithms. In this survey, we (1) review progress across diverse scientific domains\u2014fluid dynamics, weather and climate, quantum chemistry, and computational genomics\u2014that have begun adopting mixed-precision strategies; (2) examine state-of-the-art algorithmic techniques such as iterative refinement, splitting and emulation schemes, and adaptive precision solvers; (3) assess their implications for accuracy, performance, and resource utilization; and (4) survey the emerging software ecosystem that enables mixed-precision methods at scale. We conclude with perspectives and recommendations on cross-cutting opportunities, domain-specific challenges, and the role of co-design between application scientists, numerical analysts, and computer scientists. Collectively, this survey underscores that mixed-precision numerics can reshape computational science by aligning algorithms with the evolving landscape of hardware capabilities.\n                  <\/jats:p>","DOI":"10.1007\/s11227-026-08264-4","type":"journal-article","created":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T01:39:25Z","timestamp":1774316365000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Mixed-precision numerics in scientific applications: survey and perspectives"],"prefix":"10.1007","volume":"82","author":[{"given":"Aditya","family":"Kashi","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hao","family":"Lu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wesley","family":"Brewer","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Rogers","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"Matheson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mallikarjun","family":"Shankar","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Feiyi","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,3,24]]},"reference":[{"key":"8264_CR1","volume-title":"Numerical methods for scientists and engineers","author":"RW Hamming","year":"1986","unstructured":"Hamming RW (1986) Numerical methods for scientists and engineers, 2nd edn. Dover Publications, New York","edition":"2"},{"issue":"1","key":"8264_CR2","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1145\/103162.103163","volume":"23","author":"D Goldberg","year":"1991","unstructured":"Goldberg D (1991) What every computer scientist should know about floating-point arithmetic. ACM Comput Surv (CSUR) 23(1):5\u201348. https:\/\/doi.org\/10.1145\/103162.103163","journal-title":"ACM Comput Surv (CSUR)"},{"issue":"1","key":"8264_CR3","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.106.015308","volume":"106","author":"M Lehmann","year":"2022","unstructured":"Lehmann M, Krause MJ, Amati G, Sega M, Harting J, Gekle S (2022) Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats. Phys Rev E 106(1):015308","journal-title":"Phys Rev E"},{"issue":"12","key":"8264_CR4","doi-asserted-by":"publisher","first-page":"2295","DOI":"10.1109\/JPROC.2017.2761740","volume":"105","author":"V Sze","year":"2017","unstructured":"Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295\u20132329","journal-title":"Proc IEEE"},{"key":"8264_CR5","doi-asserted-by":"crossref","unstructured":"Sakamoto R, Kondo M, Fujita K, Ichimura T, Nakajima K (2020) The effectiveness of low-precision floating arithmetic on numerical codes: a case study on power consumption. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp 199\u2013206","DOI":"10.1145\/3368474.3368492"},{"issue":"9","key":"8264_CR6","first-page":"3","volume":"58","author":"E Carson","year":"2025","unstructured":"Carson E, Mary T (2025) Mixed-precision computing: high accuracy with low precision. SIAM News 58(9):3","journal-title":"SIAM News"},{"key":"8264_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2023.104746","volume":"181","author":"A Netti","year":"2023","unstructured":"Netti A, Peng Y, Omland P, Paulitsch M, Parra J, Espinosa G, Agarwal U, Chan A, Pattabiraman K (2023) Mixed precision support in hpc applications: what about reliability? J Parallel Distrib Comput 181:104746. https:\/\/doi.org\/10.1016\/j.jpdc.2023.104746","journal-title":"J Parallel Distrib Comput"},{"issue":"4","key":"8264_CR8","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1109\/MCSE.2022.3215477","volume":"24","author":"H Ltaief","year":"2022","unstructured":"Ltaief H, Genton MG, Gratadour D, Keyes DE, Ravasi M (2022) Responsibly reckless matrix algorithms for HPC scientific applications. Comput Sci Eng 24(4):12\u201322. https:\/\/doi.org\/10.1109\/MCSE.2022.3215477","journal-title":"Comput Sci Eng"},{"key":"8264_CR9","unstructured":"Morgan TP. AMD Previews \"Turin\" Epyc CPUs, Expands Instinct GPU roadmap. https:\/\/www.nextplatform.com\/2024\/06\/03\/amd-previews-turin-epyc-cpus-expands-instinct-gpu-roadmap\/ Accessed 30 Oct 2024"},{"key":"8264_CR10","unstructured":"Morgan T.P Nvidia unfolds GPU, interconnect roadmaps out to 2027. https:\/\/www.nextplatform.com\/2024\/06\/02\/nvidia-unfolds-gpu-interconnect-roadmaps-out-to-2027\/ Accessed 30 Oct 2024"},{"key":"8264_CR11","unstructured":"Dongarra J, Gunnels J, Bayraktar H, Haidar A, Ernst D (2024) Hardware trends impacting floating-point computations in scientific applications. https:\/\/arxiv.org\/abs\/2411.12090"},{"key":"8264_CR12","unstructured":"Cook JD. What is Bfloat16? https:\/\/www.johndcook.com\/blog\/2018\/11\/15\/bfloat16\/ Accessed 05 Sept 2024"},{"key":"8264_CR13","unstructured":"Kalamkar D, Mudigere D, Mellempudi N, Das D, Banerjee K, Avancha S, Vooturi DT, Jammalamadaka N, Huang J, Yuen H et al (2019) A study of bfloat16 for deep learning training. Preprint at arXiv:1905.12322"},{"key":"8264_CR14","unstructured":"Kharya P. What is the TensorFloat-32 precision format? https:\/\/blogs.nvidia.com\/blog\/tensorfloat-32-precision-format\/ Accessed 05 Sept 2024"},{"key":"8264_CR15","unstructured":"NVIDIA: Using FP8 with transformer engine. https:\/\/docs.nvidia.com\/deeplearning\/transformer-engine\/user-guide\/examples\/fp8_primer.html Accessed 09 Sept 2024"},{"key":"8264_CR16","doi-asserted-by":"publisher","unstructured":"Abdelfattah A, Anzt H, Ayala A, Boman E, Carson E, Cayrols S, Cojean T, Dongarra J, Falgout R, Gates M, Gruetzmacher T, Higham N, Kruger S, Li X, Lindquist N, Liu Y, Loe J, Luszczek P, Nayak P, Osei-Kuffuor D, Pranesh S, Rajamanickam S, Ribizel T, Smith B, Swirydowicz K, Thomas S, Tomov S, Tsai Y, Yamazaki I, Yang UM (2021) Advances in mixed precision algorithms: 2021 edition. In: Technical Report LLNL-TR-825909, Lawrence Livermore National Lab. (LLNL), Livermore. https:\/\/doi.org\/10.2172\/1814677","DOI":"10.2172\/1814677"},{"issue":"4","key":"8264_CR17","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1177\/10943420211003313","volume":"35","author":"A Abdelfattah","year":"2021","unstructured":"Abdelfattah A, Anzt H, Boman EG, Carson E, Cojean T, Dongarra J, Fox A, Gates M, Higham NJ, Li XS, Loe J, Luszczek P, Pranesh S, Rajamanickam S, Ribizel T, Smith BF, Swirydowicz K, Thomas S, Tomov S, Tsai YM, Yang UM (2021) A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int J High Perform Comput Appl 35(4):344\u2013369. https:\/\/doi.org\/10.1177\/10943420211003313","journal-title":"Int J High Perform Comput Appl"},{"key":"8264_CR18","doi-asserted-by":"crossref","unstructured":"Anzt H (2024) xSDK-multiprecision final report for subcontract partner KIT. In: Technical Report LLNL-SR-861087, Lawrence Livermore National Lab. (LLNL), Livermore","DOI":"10.2172\/2328216"},{"key":"8264_CR19","unstructured":"Abdelfattah A, Anzt H, Boman E, Carson E, Cojean T, Dongarra J, Gates M, Gruetzmacher T, Higham N.J., Li S, Lindquist N, Liu Y, Loe J, Luszczek P, Nayak P, Pranesh S, Rajamanickam S, Ribizel T, Smith B, Swirydowicz K, Thomas S, Tomov S, Tsai Y, Yamazaki I, Yang UM (2020) A survey of numerical methods utilizing mixed precision arithmetic. SLATE Working Notes 15, ICL-UT-20-08, Innovative Computing Laboratory, University of Tennessee, Knoxville. https:\/\/icl.utk.edu\/publications\/swan-015 Accessed 20 Aug 2024"},{"key":"8264_CR20","doi-asserted-by":"publisher","first-page":"347","DOI":"10.1017\/S0962492922000022","volume":"31","author":"NJ Higham","year":"2022","unstructured":"Higham NJ, Mary T (2022) Mixed precision algorithms in numerical linear algebra. Acta Numer 31:347\u2013414. https:\/\/doi.org\/10.1017\/S0962492922000022","journal-title":"Acta Numer"},{"key":"8264_CR21","doi-asserted-by":"crossref","unstructured":"Karp M, Stanly R, Mukha T, Galimberti L, Toosi S, Song H, Dalcin L, Rezaeiravesh S, Jansson N, Markidis S, Parsani M, Bose S, Lele S, Schlatter P (2025) Effects of lower floating-point precision on scale-resolving numerical simulations of turbulence. https:\/\/arxiv.org\/abs\/2506.05150","DOI":"10.2139\/ssrn.5290823"},{"key":"8264_CR22","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1007\/978-3-031-32041-5_10","volume-title":"High Performance Computing","author":"RD Budiardja","year":"2023","unstructured":"Budiardja RD, Berrill M, Eisenbach M, Jansen GR, Joubert W, Nichols S, Rogers DM, Tharrington A, Bronson Messer OE (2023) Ready for the frontier: Preparing applications for the world\u2019s first exascale system. In: Bhatele A, Hammond J, Baboulin M, Kruse C (eds) High Performance Computing. Springer, Cham, pp 182\u2013201. https:\/\/doi.org\/10.1007\/978-3-031-32041-5_10"},{"key":"8264_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.future.2023.10.006","volume":"152","author":"F Brogi","year":"2024","unstructured":"Brogi F, Bn\u00e0 S, Boga G, Amati G, Ongaro TE, Cerminara M (2024) On floating point precision in computational fluid dynamics using openfoam. Futur Gener Comput Syst 152:1\u201316","journal-title":"Futur Gener Comput Syst"},{"issue":"14","key":"8264_CR24","doi-asserted-by":"publisher","first-page":"1505","DOI":"10.1103\/PhysRevLett.56.1505","volume":"56","author":"U Frisch","year":"1986","unstructured":"Frisch U, Hasslacher B, Pomeau Y (1986) Lattice-gas automata for the Navier-Stokes equation. Phys Rev Lett 56(14):1505\u20131508. https:\/\/doi.org\/10.1103\/PhysRevLett.56.1505","journal-title":"Phys Rev Lett"},{"key":"8264_CR25","doi-asserted-by":"publisher","first-page":"871","DOI":"10.1007\/s10596-020-10028-9","volume":"25","author":"JE McClure","year":"2021","unstructured":"McClure JE, Li Z, Berrill M, Ramstad T (2021) The LBPM software package for simulating multiphase flow on digital images of porous rocks. Comput Geosci 25:871\u2013895. https:\/\/doi.org\/10.1007\/s10596-020-10028-9","journal-title":"Comput Geosci"},{"key":"8264_CR26","doi-asserted-by":"publisher","unstructured":"Walden A, Nielsen E, Diskin B, Zubair M (2019) A mixed precision multicolor point-implicit solver for unstructured grids on GPUs. In: 2019 IEEE\/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pp 23\u201330. https:\/\/doi.org\/10.1109\/IA349570.2019.00010","DOI":"10.1109\/IA349570.2019.00010"},{"issue":"4","key":"8264_CR27","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1016\/S0167-8191(00)00075-2","volume":"27","author":"WD Gropp","year":"2001","unstructured":"Gropp WD, Kaushik DK, Keyes DE, Smith BF (2001) High-performance parallel implicit cfd. Parallel Comput 27(4):337\u2013362. https:\/\/doi.org\/10.1016\/S0167-8191(00)00075-2","journal-title":"Parallel Comput"},{"issue":"2","key":"8264_CR28","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1088\/0067-0049\/217\/2\/24","volume":"217","author":"EE Schneider","year":"2015","unstructured":"Schneider EE, Robertson BE (2015) Cholla: a new massively parallel hydrodynamics code for astrophysical simulation. Astrophys J Suppl Ser 217(2):24. https:\/\/doi.org\/10.1088\/0067-0049\/217\/2\/24","journal-title":"Astrophys J Suppl Ser"},{"issue":"1","key":"8264_CR29","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s42967-021-00129-2","volume":"5","author":"SE Field","year":"2021","unstructured":"Field SE, Gottlieb S, Grant ZJ, Isherwood LF, Khanna G (2021) A GPU-accelerated mixed-precision WENO method for extremal black hole and gravitational wave physics computations. Commun Appl Math Comput 5(1):97\u2013115. https:\/\/doi.org\/10.1007\/s42967-021-00129-2","journal-title":"Commun Appl Math Comput"},{"key":"8264_CR30","doi-asserted-by":"publisher","unstructured":"Ravikumar K, Appelhans D, Yeung PK (2019) GPU acceleration of extreme scale pseudo-spectral simulations of turbulence using asynchronism. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC \u201919. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3295500.3356209","DOI":"10.1145\/3295500.3356209"},{"key":"8264_CR31","doi-asserted-by":"crossref","unstructured":"Wilfong B, Radhakrishnan A, Berre HL, Tselepidis N, Dorschner B, Budiardja R, Cornille B, Abbott S, Sch\u00e4fer F, Bryngelson SH (2025) Simulating many-engine spacecraft: exceeding 100 trillion grid points via information geometric regularization and the MFC flow solver. https:\/\/arxiv.org\/abs\/2505.07392","DOI":"10.1145\/3712285.3771783"},{"key":"8264_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2025.107990","volume":"174","author":"Y Chen","year":"2026","unstructured":"Chen Y, de Oliveira CP, Bientinesi P, Jansson N, Iakymchuk R (2026) Enabling mixed-precision in spectral element codes. Futur Gener Comput Syst 174:107990. https:\/\/doi.org\/10.1016\/j.future.2025.107990","journal-title":"Futur Gener Comput Syst"},{"key":"8264_CR33","doi-asserted-by":"publisher","unstructured":"Denis C, De Oliveira Castro P, Petit E (2016) Verificarlo: checking floating point accuracy through Monte Carlo arithmetic. In: 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp 55\u201362. https:\/\/doi.org\/10.1109\/ARITH.2016.31","DOI":"10.1109\/ARITH.2016.31"},{"issue":"16","key":"8264_CR34","doi-asserted-by":"publisher","first-page":"6301","DOI":"10.5194\/gmd-17-6301-2024","volume":"17","author":"S Chen","year":"2024","unstructured":"Chen S, Zhang Y, Wang Y, Liu Z, Li X, Xue W (2024) Mixed-precision computing in the grist dynamical core for weather and climate modelling. Geosci Model Dev 17(16):6301\u20136318. https:\/\/doi.org\/10.5194\/gmd-17-6301-2024","journal-title":"Geosci Model Dev"},{"issue":"729","key":"8264_CR35","doi-asserted-by":"publisher","first-page":"1590","DOI":"10.1002\/qj.3754","volume":"146","author":"L Saffin","year":"2020","unstructured":"Saffin L, Hatfield S, D\u00fcben P (2020) Palmer T Reduced-precision parametrization: lessons from an intermediate-complexity atmospheric model. Q J R Meteorol Soc 146(729):1590\u20131607. https:\/\/doi.org\/10.1002\/qj.3754","journal-title":"Q J R Meteorol Soc"},{"issue":"741","key":"8264_CR36","doi-asserted-by":"publisher","first-page":"4358","DOI":"10.1002\/qj.4181","volume":"147","author":"STK Lang","year":"2021","unstructured":"Lang STK, Dawson A, Diamantakis M, Dueben P, Hatfield S, Leutbecher M, Palmer T, Prates F, Roberts CD, Sandu I, Wedi N (2021) More accuracy with less precision. Q J R Meteorol Soc 147(741):4358\u20134370. https:\/\/doi.org\/10.1002\/qj.4181","journal-title":"Q J R Meteorol Soc"},{"issue":"9","key":"8264_CR37","doi-asserted-by":"publisher","first-page":"e2022MS003148","DOI":"10.1029\/2022MS003148","volume":"14","author":"J Ackmann","year":"2022","unstructured":"Ackmann J, Dueben PD, Palmer T, Smolarkiewicz PK (2022) Mixed-precision for linear solvers in global geophysical flows. J Adv Model Earth Syst 14(9):e2022MS003148. https:\/\/doi.org\/10.1029\/2022MS003148","journal-title":"J Adv Model Earth Syst"},{"key":"8264_CR38","doi-asserted-by":"publisher","unstructured":"Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia. https:\/\/doi.org\/10.1137\/1.9780898718003","DOI":"10.1137\/1.9780898718003"},{"issue":"1","key":"8264_CR39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1137\/21M1465032","volume":"45","author":"M Fasi","year":"2023","unstructured":"Fasi M, Higham NJ, Lopez F, Mary T, Mikaitis M (2023) Matrix multiplication in multiword arithmetic: Error analysis and application to GPU tensor cores. SIAM J Sci Comput 45(1):1\u201319. https:\/\/doi.org\/10.1137\/21M1465032","journal-title":"SIAM J Sci Comput"},{"key":"8264_CR40","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1016\/j.cpc.2019.07.002","volume":"244","author":"CM Maynard","year":"2019","unstructured":"Maynard CM, Walters DN (2019) Mixed-precision arithmetic in the endgame dynamical core of the unified model, a numerical weather prediction and climate model code. Comput Phys Commun 244:69\u201375. https:\/\/doi.org\/10.1016\/j.cpc.2019.07.002","journal-title":"Comput Phys Commun"},{"issue":"2","key":"8264_CR41","doi-asserted-by":"publisher","first-page":"2021","DOI":"10.1029\/2021MS002684","volume":"14","author":"M Kl\u00f6wer","year":"2022","unstructured":"Kl\u00f6wer M, Hatfield S, Croci M, D\u00fcben PD, Palmer TN (2022) Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into float16. J Adv Model Earth Syst 14(2):2021\u20132684. https:\/\/doi.org\/10.1029\/2021MS002684","journal-title":"J Adv Model Earth Syst"},{"issue":"4","key":"8264_CR42","doi-asserted-by":"publisher","first-page":"783","DOI":"10.1137\/0914050","volume":"14","author":"NJ Higham","year":"1993","unstructured":"Higham NJ (1993) The accuracy of floating point summation. SIAM J Sci Comput 14(4):783\u2013799","journal-title":"SIAM J Sci Comput"},{"key":"8264_CR43","doi-asserted-by":"publisher","unstructured":"Abdulah S, Ltaief H, Sun Y, Genton MG, Keyes DE (2019) Geostatistical modeling and prediction using mixed precision tile cholesky factorization. In: 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp 152\u2013162. https:\/\/doi.org\/10.1109\/HiPC.2019.00028","DOI":"10.1109\/HiPC.2019.00028"},{"issue":"4","key":"8264_CR44","doi-asserted-by":"publisher","first-page":"964","DOI":"10.1109\/TPDS.2021.3084071","volume":"33","author":"S Abdulah","year":"2022","unstructured":"Abdulah S, Cao Q, Pei Y, Bosilca G, Dongarra J, Genton MG, Keyes DE, Ltaief H, Sun Y (2022) Accelerating geostatistical modeling and prediction with mixed-precision computations: a high-productivity approach with parsec. IEEE Trans Parallel Distrib Syst 33(4):964\u2013976. https:\/\/doi.org\/10.1109\/TPDS.2021.3084071","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"8264_CR45","doi-asserted-by":"publisher","unstructured":"Cao Q, Abdulah S, Ltaief H, Genton MG, Keyes D, Bosilca G (2023) Reducing data motion and energy consumption of geospatial modeling applications using automated precision conversion. In: 2023 IEEE International Conference on Cluster Computing (CLUSTER), pp 330\u2013342. https:\/\/doi.org\/10.1109\/CLUSTER52292.2023.00035","DOI":"10.1109\/CLUSTER52292.2023.00035"},{"key":"8264_CR46","doi-asserted-by":"publisher","unstructured":"Abdulah S, Baker AH, Bosilca G, Cao Q, Castruccio S, Genton MG, Keyes DE, Khalid Z, Ltaief H, Song Y, Stenchikov GL, Sun Y (2024) Boosting earth system model outputs and saving petabytes in their storage using exascale climate emulators. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. SC \u201924, Atlanta, GA, USA. https:\/\/doi.org\/10.1109\/SC41406.2024.00008","DOI":"10.1109\/SC41406.2024.00008"},{"issue":"1","key":"8264_CR47","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1021\/ct300321a","volume":"9","author":"AV Titov","year":"2013","unstructured":"Titov AV, Ufimtsev IS, Luehr N, Martinez TJ (2013) Generating efficient quantum chemistry codes for novel architectures. J Chem Theory Comput 9(1):213\u2013221. https:\/\/doi.org\/10.1021\/ct300321a","journal-title":"J Chem Theory Comput"},{"key":"8264_CR48","doi-asserted-by":"publisher","unstructured":"Das S, Motamarri P, Gavini V, Turcksin B, Li YW, Leback B (2019) Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 pflops simulation of a metallic dislocation system. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC \u201919. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3295500.3357157","DOI":"10.1145\/3295500.3357157"},{"key":"8264_CR49","doi-asserted-by":"publisher","unstructured":"Das S, Kanungo B, Subramanian V, Panigrahi G, Motamarri P, Rogers D, Zimmerman P, Gavini V (2023) Large-scale materials modeling at quantum accuracy: Ab initio simulations of quasicrystals and interacting extended defects in metallic alloys. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC \u201923. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3581784.3627037","DOI":"10.1145\/3581784.3627037"},{"issue":"24","key":"8264_CR50","doi-asserted-by":"publisher","first-page":"10826","DOI":"10.1021\/acs.jctc.4c00938","volume":"20","author":"W Dawson","year":"2024","unstructured":"Dawson W, Ozaki K, Domke J, Nakajima T (2024) Reducing numerical precision requirements in quantum chemistry calculations. J Chem Theory Comput 20(24):10826\u201310837. https:\/\/doi.org\/10.1021\/acs.jctc.4c00938","journal-title":"J Chem Theory Comput"},{"key":"8264_CR51","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1016\/j.cpc.2016.07.013","volume":"211","author":"M Eisenbach","year":"2017","unstructured":"Eisenbach M, Larkin J, Lutjens J, Rennich S, Rogers JH (2017) Gpu acceleration of the locally selfconsistent multiple scattering code for first principles calculation of the ground state and statistical physics of materials. Comput Phys Commun 211:2\u20137. https:\/\/doi.org\/10.1016\/j.cpc.2016.07.013","journal-title":"Comput Phys Commun"},{"key":"8264_CR52","doi-asserted-by":"publisher","unstructured":"Malaya N, Messer B, Glenski J, Georgiadou A, Lietz J, Gottiparthi K, Day M, Chen J, Rood J, Esclapez L, White III J, Jansen GR, Curtis N, Nichols S, Kurzak J, Chalmers N, Freitag C, Bauman P, Fanfarillo A, Budiardja RD, Papatheodore T, Frontiere N, Mcdougall D, Norman M, Sreepathi S, Roth P, Bykov D, Wolfe N, Mullowney P, Eisenbach M, Henry De Frahan MT, Joubert W (2023) Experiences readying applications for exascale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC \u201923. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3581784.3607065","DOI":"10.1145\/3581784.3607065"},{"issue":"12","key":"8264_CR53","doi-asserted-by":"publisher","first-page":"7260","DOI":"10.1021\/acs.jctc.2c00632","volume":"18","author":"Y Tian","year":"2022","unstructured":"Tian Y, Xie Z, Luo Z, Ma H (2022) Mixed-precision implementation of the density matrix renormalization group. J Chem Theory Comput 18(12):7260\u20137271. https:\/\/doi.org\/10.1021\/acs.jctc.2c00632","journal-title":"J Chem Theory Comput"},{"key":"8264_CR54","doi-asserted-by":"crossref","unstructured":"Joubert W, Weighill D, Kainer D, Climer S, Justice A, Fagnan K, Jacobson D (2018) Attacking the opioid epidemic: Determining the epistatic and pleiotropic genetic architectures for chronic pain and opioid addiction. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 717\u2013730","DOI":"10.1109\/SC.2018.00060"},{"key":"8264_CR55","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.parco.2019.02.003","volume":"84","author":"W Joubert","year":"2019","unstructured":"Joubert W, Nance J, Climer S, Weighill D, Jacobson D (2019) Parallel accelerated custom correlation coefficient calculations for genomics applications. Parallel Comput 84:15\u201323. https:\/\/doi.org\/10.1016\/j.parco.2019.02.003","journal-title":"Parallel Comput"},{"key":"8264_CR56","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1016\/j.parco.2018.03.009","volume":"75","author":"W Joubert","year":"2018","unstructured":"Joubert W, Nance J, Weighill D, Jacobson D (2018) Parallel accelerated vector similarity calculations for genomics applications. Parallel Comput 75:130\u2013145","journal-title":"Parallel Comput"},{"key":"8264_CR57","doi-asserted-by":"publisher","unstructured":"Ltaief H, Alomairy R, Cao Q, Ren J, Slim L, Kurth T, Dorschner B, Bougouffa S, Abdelkhalak R, Keyes DE (2024) Toward capturing genetic epistasis from multivariate genome-wide association studies using mixed-precision kernel ridge regression. In: SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1\u201312. https:\/\/doi.org\/10.1109\/SC41406.2024.00012","DOI":"10.1109\/SC41406.2024.00012"},{"issue":"5","key":"8264_CR58","doi-asserted-by":"publisher","DOI":"10.1063\/5.0146456","volume":"35","author":"S Bhushan","year":"2023","unstructured":"Bhushan S, Burgreen GW, Brewer W, Dettwiller ID (2023) Assessment of neural network augmented Reynolds averaged Navier Stokes turbulence model in extrapolation modes. Phys Fluids 35(5):055129","journal-title":"Phys Fluids"},{"key":"8264_CR59","unstructured":"Meena MG, Liousas D, Simin AD, Kashi A, Brewer WH, Riley JJ, Bruyn Kops SM (2024) Machine-learned closure of URANS for stably stratified turbulence: connecting physical timescales & data hyperparameters of deep time-series models"},{"issue":"11","key":"8264_CR60","doi-asserted-by":"publisher","first-page":"6069","DOI":"10.1029\/2018GL081646","volume":"46","author":"A Pal","year":"2019","unstructured":"Pal A, Mahajan S, Norman MR (2019) Using deep neural networks as cost-effective surrogate models for super-parameterized E3SM radiative transfer. Geophys Res Lett 46(11):6069\u20136079. https:\/\/doi.org\/10.1029\/2018GL081646","journal-title":"Geophys Res Lett"},{"issue":"1","key":"8264_CR61","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1038\/s41524-019-0189-9","volume":"5","author":"C Nyshadham","year":"2019","unstructured":"Nyshadham C, Rupp M, Bekker B, Shapeev AV, Mueller T, Rosenbrock CW, Cs\u00e1nyi G, Wingate DW, Hart GL (2019) Machine-learned multi-system surrogate models for materials prediction. npj Comput Mater 5(1):51. https:\/\/doi.org\/10.1038\/s41524-019-0189-9","journal-title":"npj Comput Mater"},{"issue":"1","key":"8264_CR62","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1038\/s41524-024-01343-1","volume":"10","author":"T Koker","year":"2024","unstructured":"Koker T, Quigley K, Taw E, Tibbetts K, Li L (2024) Higher-order equivariant neural networks for charge density prediction in materials. Nat Comput Mater 10(1):14. https:\/\/doi.org\/10.1038\/s41524-024-01343-1","journal-title":"Nat Comput Mater"},{"key":"8264_CR63","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1007\/978-3-031-19803-8_32","volume-title":"Computer Vision - ECCV 2022","author":"A Levy","year":"2022","unstructured":"Levy A, Poitevin F, Martel J, Nashed Y, Peck A, Miolane N, Ratner D, Dunne M, Wetzstein G (2022) CryoAI: Amortized inference of poses for ab initio reconstruction of 3D molecular volumes from real Cryo-EM images. In: Avidan S, Brostow G, Ciss\u00e9 M, Farinella GM, Hassner T (eds) Computer Vision - ECCV 2022. Springer, Switzerland, pp 540\u2013557"},{"key":"8264_CR64","doi-asserted-by":"publisher","unstructured":"Or A, Jain A, Vega-Myhre D, Cai J, Hernandez CD, Zheng Z, Guessous D, Kuznetsov V, Puhrsch C, Saroufim M, Rao S, Tran T, Samard\u017ei\u0107 A (2025) TorchAO: PyTorch-native training-to-serving model optimization. In: Proceedings of ICML 2025 Workshop on Championing Open-source Development in Machine Learning (CODEML \u201925). https:\/\/doi.org\/10.48550\/arXiv.2507.16099","DOI":"10.48550\/arXiv."},{"key":"8264_CR65","doi-asserted-by":"crossref","unstructured":"Brewer W, Geyer C, Kleiner D, Horne C (2021) Streaming detection and classification performance of a POWER9 edge supercomputer. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, pp 1\u20137","DOI":"10.1109\/HPEC49654.2021.9622852"},{"key":"8264_CR66","unstructured":"Tu R, White C, Kossaifi J, Bonev B, Kovachki N, Pekhimenko G, Azizzadenesheli K, Anandkumar A (2024) Guaranteed approximation bounds for mixed-precision neural operators"},{"key":"8264_CR67","doi-asserted-by":"crossref","unstructured":"Wang X, Tsaris A, Liu S, Choi J-Y, Fan M, Zhang W, Yin J, Ashfaq M, Lu D, Balaprakash P (2024) ORBIT: Oak Ridge base foundation model for earth system predictability. Preprint at arXiv:2404.14712","DOI":"10.1109\/SC41406.2024.00007"},{"key":"8264_CR68","unstructured":"Petitet A, Whaley RC, Dongarra J, Cleary A HPL\u2014A portable implementation of the high-performance Linpack benchmark for distributed-memory computers. https:\/\/netlib.org\/benchmark\/hpl\/ Accessed 08 Aug 2024"},{"key":"8264_CR69","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2021.102870","volume":"111","author":"K \u015awirydowicz","year":"2022","unstructured":"\u015awirydowicz K, Darve E, Jones W, Maack J, Regev S, Saunders MA, Thomas SJ, Pele\u0161 S (2022) Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers. Parallel Comput 111:102870. https:\/\/doi.org\/10.1016\/j.parco.2021.102870","journal-title":"Parallel Comput"},{"issue":"2","key":"8264_CR70","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1137\/0710032","volume":"10","author":"A George","year":"1973","unstructured":"George A (1973) Nested dissection of a regular finite element mesh. SIAM J Numer Anal 10(2):345","journal-title":"SIAM J Numer Anal"},{"key":"8264_CR71","doi-asserted-by":"crossref","unstructured":"George A (1980) An automatic one-way dissection algorithm for irregular finite element problems. SIAM J Numer Anal. 17(6)","DOI":"10.1137\/0717062"},{"issue":"138","key":"8264_CR72","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1090\/S0025-5718-1977-0431719-X","volume":"31","author":"A Brandt","year":"1977","unstructured":"Brandt A (1977) Multi-level adaptive solutions to boundary-value problems. Math Comput 31(138):333","journal-title":"Math Comput"},{"issue":"2","key":"8264_CR73","doi-asserted-by":"publisher","first-page":"316","DOI":"10.1145\/321386.321394","volume":"14","author":"CB Moler","year":"1967","unstructured":"Moler CB (1967) Iterative refinement in floating point. J ACM 14(2):316\u2013321. https:\/\/doi.org\/10.1145\/321386.321394","journal-title":"J ACM"},{"key":"8264_CR74","doi-asserted-by":"publisher","unstructured":"Ma Z, Wang H, Feng G, Zhang C, Xie L, He J, Chen S, Zhai J (2022) Efficiently emulating high-bitwidth computation with low-bitwidth hardware. In: Proceedings of the 36th ACM International Conference on Supercomputing. ICS \u201922. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3524059.3532377","DOI":"10.1145\/3524059.3532377"},{"key":"8264_CR75","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1007\/s11075-011-9478-1","volume":"59","author":"K Ozaki","year":"2012","unstructured":"Ozaki K, Ogita T, Oishi S, Rump SM (2012) Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numer Algorithms 59:95\u2013118. https:\/\/doi.org\/10.1007\/s11075-011-9478-1","journal-title":"Numer Algorithms"},{"key":"8264_CR76","doi-asserted-by":"publisher","unstructured":"Mukunoki D, Ozaki K, Ogita T, Imamura T (2020) DGEMM using tensor cores, and its accurate and reproducible versions. In: International Conference on High Performance Computing. Springer, pp 230\u2013248. https:\/\/doi.org\/10.1007\/978-3-030-50743-5_12.Springer","DOI":"10.1007\/978-3-030-50743-5_12."},{"issue":"4","key":"8264_CR77","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1177\/10943420241239588","volume":"38","author":"H Ootomo","year":"2024","unstructured":"Ootomo H, Ozaki K, Yokota R (2024) DGEMM on integer matrix multiplication unit. Int J High Perform Comput Appl 38(4):297\u2013313. https:\/\/doi.org\/10.1177\/10943420241239588","journal-title":"Int J High Perform Comput Appl"},{"key":"8264_CR78","unstructured":"Bayraktar H (2025) Precision redefined: unlocking and delivering the full power of modern gpus for scientific computing. Platform for Advanced Scientific Computing (PASC). https:\/\/pasc25.pasc-conference.org\/presentation\/?id=msa270&sess=sess106"},{"key":"8264_CR79","unstructured":"Bernabeu SR (2025) Energy-efficient supercomputing through tensor core-accelerated mixed-precision computing and floating-point emulation. NVIDIA GTC. https:\/\/www.nvidia.com\/en-us\/on-demand\/session\/gtc25-s71487\/"},{"key":"8264_CR80","unstructured":"Brower C, Gunnels J, Lopez G (2025) Unlocking tensor core performance with floating point emulation in cuBLAS. https:\/\/developer.nvidia.com\/blog\/unlocking-tensor-core-performance-with-floating-point-emulation-in-cublas\/. Accessed 12 Dec 2025"},{"key":"8264_CR81","unstructured":"Uchino Y, Ozaki K, Imamura T (2024) Performance enhancement of the Ozaki scheme on integer matrix multiplication unit. https:\/\/arxiv.org\/abs\/2409.13313"},{"key":"8264_CR82","unstructured":"Abdelfattah A, Dongarra J, Fasi M, Mikaitis M, Tisseur F (2025) Analysis of floating-point matrix multiplication computed via integer arithmetic. https:\/\/arxiv.org\/abs\/2506.11277"},{"key":"8264_CR83","unstructured":"Ozaki K, Uchino Y, Imamura T (2025) Ozaki Scheme II: A GEMM-oriented emulation of floating-point matrix multiplication using an integer modular technique. https:\/\/arxiv.org\/abs\/2504.08009v3"},{"issue":"1","key":"8264_CR84","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1137\/22M1522619","volume":"46","author":"S Graillat","year":"2024","unstructured":"Graillat S, J\u00e9z\u00e9quel F, Mary T, Molina R (2024) Adaptive precision sparse matrix\u2013vector product and its application to Krylov solvers. SIAM J Sci Comput 46(1):30\u201356. https:\/\/doi.org\/10.1137\/22M1522619","journal-title":"SIAM J Sci Comput"},{"key":"8264_CR85","doi-asserted-by":"publisher","DOI":"10.1016\/j.cam.2019.112701","volume":"372","author":"D Mukunoki","year":"2020","unstructured":"Mukunoki D, Ogita T (2020) Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs. J Comput Appl Math 372:112701. https:\/\/doi.org\/10.1016\/j.cam.2019.112701","journal-title":"J Comput Appl Math"},{"key":"8264_CR86","doi-asserted-by":"publisher","unstructured":"Haidar A, Tomov S, Dongarra J, Higham NJ (2018) Harnessing GPU tensor cores for fast fp16 arithmetic to speed up mixed-precision iterative refinement solvers. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 603\u2013613. https:\/\/doi.org\/10.1109\/SC.2018.00050","DOI":"10.1109\/SC.2018.00050"},{"key":"8264_CR87","doi-asserted-by":"publisher","unstructured":"Lu H, Matheson M, Oles V, Ellis A, Joubert W, Wang F (2022) Climbing the summit and pushing the frontier of mixed precision benchmarks at extreme scale. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC \u201922. IEEE Press. https:\/\/doi.org\/10.5555\/3571885.3571988","DOI":"10.5555\/3571885.3571988"},{"key":"8264_CR88","doi-asserted-by":"publisher","unstructured":"Abdelfattah A, Tomov S, Dongarra J (2020) Investigating the benefit of fp16-enabled mixed-precision solvers for symmetric positive definite matrices using gpus. In: Krzhizhanovskaya VV, Z\u00e1vodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J (eds) Computational Science - ICCS 2020. Springer, Cham, pp 237\u2013250. https:\/\/doi.org\/10.1007\/978-3-030-50417-5_18","DOI":"10.1007\/978-3-030-50417-5_18"},{"issue":"4","key":"8264_CR89","doi-asserted-by":"publisher","first-page":"1377597","DOI":"10.1145\/1377596.1377597","volume":"34","author":"A Buttari","year":"2008","unstructured":"Buttari A, Dongarra J, Kurzak J, Luszczek P, Tomov S (2008) Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy. ACM Trans Math Softw 34(4):1377597. https:\/\/doi.org\/10.1145\/1377596.1377597","journal-title":"ACM Trans Math Softw"},{"key":"8264_CR90","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663","author":"TA Davis","year":"2011","unstructured":"Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw. https:\/\/doi.org\/10.1145\/2049662.2049663","journal-title":"ACM Trans Math Softw"},{"issue":"1","key":"8264_CR91","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3582493","volume":"49","author":"P Amestoy","year":"2023","unstructured":"Amestoy P, Buttari A, Higham NJ, L\u2019Excellent J-I, Mary T, Vieubl\u00e9 B (2023) Combining sparse approximate factorizations with mixed-precision iterative refinement. ACM Trans Math Softw 49(1):1. https:\/\/doi.org\/10.1145\/3582493","journal-title":"ACM Trans Math Softw"},{"key":"8264_CR92","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.778","volume":"8","author":"M Zounon","year":"2022","unstructured":"Zounon M, Higham NJ, Lucas C, Tisseur F (2022) Performance impact of precision reduction in sparse linear systems solvers. PeerJ Comput Sci 8:e778. https:\/\/doi.org\/10.7717\/peerj-cs.778","journal-title":"PeerJ Comput Sci"},{"key":"8264_CR93","doi-asserted-by":"crossref","unstructured":"Loe JA, Glusa CA, Yamazaki I, Boman EG, Rajamanickam S (2021) A study of mixed precision strategies for GMRES on GPUs","DOI":"10.2172\/2001827"},{"issue":"2","key":"8264_CR94","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1177\/10943420221115140","volume":"37","author":"JI Aliaga","year":"2023","unstructured":"Aliaga JI, Anzt H, Gr\u00fctzmacher T, Quintana-Ort\u00ed ES, Tom\u00e1s AE (2023) Compressed basis gmres on high-performance graphics processing units. Int J High Perform Comput Appl 37(2):82\u2013100. https:\/\/doi.org\/10.1177\/10943420221115140","journal-title":"Int J High Perform Comput Appl"},{"key":"8264_CR95","doi-asserted-by":"crossref","unstructured":"Carson E, Gergelits T (2021) Mixed precision $$s$$-step Lanczos and conjugate gradient algorithms. https:\/\/arxiv.org\/abs\/2103.09210","DOI":"10.1002\/nla.2425"},{"key":"8264_CR96","unstructured":"Jang Y, Jolivet P, Mary T (2025) Mixed precision augmented GMRES. In: Working paper or preprint. https:\/\/hal.science\/hal-05163845"},{"key":"8264_CR97","doi-asserted-by":"publisher","unstructured":"Yamazaki I, Glusa C, Loe J, Luszczek P, Rajamanickam S, Dongarra J (2022) High-performance GMRES multi-precision benchmark: design, performance, and challenges. In: 2022 IEEE\/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp 112\u2013122. https:\/\/doi.org\/10.1109\/PMBS56514.2022.00015","DOI":"10.1109\/PMBS56514.2022.00015"},{"issue":"1","key":"8264_CR98","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1177\/1094342015593158","volume":"30","author":"J Dongarra","year":"2016","unstructured":"Dongarra J, Heroux MA, Luszczek P (2016) High-performance conjugate-gradient benchmark: a new metric for ranking high-performance computing systems. Int J High Perform Comput Appl 30(1):3\u201310. https:\/\/doi.org\/10.1177\/1094342015593158","journal-title":"Int J High Perform Comput Appl"},{"key":"8264_CR99","doi-asserted-by":"crossref","unstructured":"Kashi A, Koukpaizan N, Lu H, Matheson M, Oral S, Wang F (2025) Scaling the memory wall using mixed-precision\u2014HPG-MxP on an exascale machine. https:\/\/arxiv.org\/abs\/2507.11512","DOI":"10.1145\/3712285.3759877"},{"issue":"2","key":"8264_CR100","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3441850","volume":"47","author":"G Flegar","year":"2021","unstructured":"Flegar G, Anzt H, Cojean T, Quintana-Ort\u00ed ES (2021) Adaptive precision block-jacobi for high performance preconditioning in the ginkgo linear algebra software. ACM Trans Math Softw 47(2):1. https:\/\/doi.org\/10.1145\/3441850","journal-title":"ACM Trans Math Softw"},{"key":"8264_CR101","doi-asserted-by":"publisher","unstructured":"G\u00f6bel F, Gr\u00fctzmacher T, Ribizel T, Anzt H (2021) Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs. In: Sousa L, Roma N, Tom\u00e1s P (eds) Euro-Par 2021: parallel processing. Springer, Cham, pp 550\u2013564. https:\/\/doi.org\/10.1007\/978-3-030-85665-6_34","DOI":"10.1007\/978-3-030-85665-6_34"},{"key":"8264_CR102","doi-asserted-by":"publisher","unstructured":"Tsai Y-HM, Beams N, Anzt H (2023) Mixed precision algebraic multigrid on GPUs. In: Wyrzykowski R, Dongarra J, Deelman E, Karczewski K (eds) Parallel processing and applied mathematics. Springer, Cham, pp 113\u2013125. https:\/\/doi.org\/10.1007\/978-3-031-30442-2_9","DOI":"10.1007\/978-3-031-30442-2_9"},{"key":"8264_CR103","doi-asserted-by":"publisher","first-page":"280","DOI":"10.1016\/j.future.2023.07.024","volume":"149","author":"Y-HM Tsai","year":"2023","unstructured":"Tsai Y-HM, Beams N, Anzt H (2023) Three-precision algebraic multigrid on GPUs. Futur Gener Comput Syst 149:280\u2013293. https:\/\/doi.org\/10.1016\/j.future.2023.07.024","journal-title":"Futur Gener Comput Syst"},{"key":"8264_CR104","doi-asserted-by":"publisher","unstructured":"Tsai Y-H (2024) Portable mixed precision algebraic multigrid on high performance GPUs. In: PhD thesis, Karlsruher Institut f\u00fcr Technologie (KIT). https:\/\/doi.org\/10.5445\/IR\/1000168914","DOI":"10.5445\/IR\/1000168914"},{"key":"8264_CR105","doi-asserted-by":"publisher","unstructured":"Sorna A, Cheng X, D\u2019Azevedo E, Won K, Tomov S (2018) Optimizing the fast Fourier transform using mixed precision on tensor core hardware. In: 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), pp 3\u20137. https:\/\/doi.org\/10.1109\/HiPCW.2018.8634417","DOI":"10.1109\/HiPCW.2018.8634417"},{"key":"8264_CR106","doi-asserted-by":"publisher","unstructured":"Li B, Cheng S, Lin J (2021) tcFFT: a fast half-precision FFT library for NVIDIA tensor cores. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp 1\u201311. https:\/\/doi.org\/10.1109\/Cluster48925.2021.00035","DOI":"10.1109\/Cluster48925.2021.00035"},{"issue":"3","key":"8264_CR107","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3605148","volume":"20","author":"Y Zhao","year":"2023","unstructured":"Zhao Y, Liu F, Ma W, Li H, Peng Y, Wang C (2023) MFFT: a GPU accelerated highly efficient mixed-precision large-scale FFT framework. ACM Trans Archit Code Optim 20(3):1. https:\/\/doi.org\/10.1145\/3605148","journal-title":"ACM Trans Archit Code Optim"},{"issue":"1","key":"8264_CR108","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1137\/20M1342902","volume":"64","author":"CT Kelley","year":"2022","unstructured":"Kelley CT (2022) Newton\u2019s method in mixed precision. SIAM Rev 64(1):191\u2013211. https:\/\/doi.org\/10.1137\/20M1342902","journal-title":"SIAM Rev"},{"key":"8264_CR109","doi-asserted-by":"publisher","unstructured":"Camps D, Mach T, Vandebril R, Watkins DS (2020) On pole-swapping algorithms for the eigenvalue problem. In: Electronic transactions on numerical analysis, vol 52. Kent State University, pp 480\u2013508. https:\/\/doi.org\/10.1553\/etna_vol52s480.","DOI":"10.1553\/etna_vol52s480."},{"key":"8264_CR110","doi-asserted-by":"publisher","unstructured":"Tsai YM, Luszczek P, Dongarra J (2022) Mixed-precision algorithm for finding selected eigenvalues and eigenvectors of symmetric and hermitian matrices1. In: 2022 IEEE\/ACM Workshop on Latest Advances In Scalable Algorithms for Large-scale Heterogeneous Systems (ScalAH), pp 43\u201350. https:\/\/doi.org\/10.1109\/ScalAH56622.2022.00011","DOI":"10.1109\/ScalAH56622.2022.00011"},{"issue":"2","key":"8264_CR111","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1007\/s13160-019-00360-8","volume":"36","author":"A Alvermann","year":"2019","unstructured":"Alvermann A, Basermann A, Bungartz H-J, Carbogno C, Ernst D, Fehske H, Futamura Y, Galgon M, Hager G, Huber S, Huckle T, Ida A, Imakura A, Kawai M, K\u00f6cher S, Kreutzer M, Kus P, Lang B, Lederer H, Manin V, Marek A, Nakajima K, Nemec L, Reuter K, Rippl M, R\u00f6hrig-Z\u00f6llner M, Sakurai T, Scheffler M, Scheurer C, Shahzad F, Simoes Brambila D, Thies J, Wellein G (2019) Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects. Jpn J Ind Appl Math 36(2):699\u2013717. https:\/\/doi.org\/10.1007\/s13160-019-00360-8","journal-title":"Jpn J Ind Appl Math"},{"key":"8264_CR112","unstructured":"Kodali N, Ramakrishnan K, Motamarri P (2025) Residual-based Chebyshev filtered subspace iteration for sparse Hermitian eigenvalue problems tolerant to inexact matrix-vector products. https:\/\/arxiv.org\/abs\/2503.22652"},{"key":"8264_CR113","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1007\/s10915-022-01801-2","volume":"92","author":"ZJ Grant","year":"2022","unstructured":"Grant ZJ (2022) Perturbed Runge-Kutta methods for mixed precision applications. J Sci Comput 92:6. https:\/\/doi.org\/10.1007\/s10915-022-01801-2","journal-title":"J Sci Comput"},{"key":"8264_CR114","series-title":"Springer Series in Computational Mathematics","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-70956-3","volume-title":"B-series: algebraic analysis of numerical methods","author":"JC Butcher","year":"2021","unstructured":"Butcher JC (2021) B-series: algebraic analysis of numerical methods, 1st edn. Springer Series in Computational Mathematics. Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-030-70956-3","edition":"1"},{"key":"8264_CR115","unstructured":"Burnett B, Gottlieb S, Grant ZJ (2022) Stability analysis and performance evaluation of mixed-precision Runge-Kutta methods. https:\/\/arxiv.org\/abs\/2212.11849"},{"key":"8264_CR116","doi-asserted-by":"crossref","unstructured":"Dravins I, Koch M, Griehl V, Kormann K (2024) Performance evaluation of mixed-precision Runge-Kutta methods for the solution of partial differential equations. https:\/\/arxiv.org\/abs\/2412.16638","DOI":"10.1177\/10943420251392963"},{"key":"8264_CR117","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2022.111349","volume":"464","author":"M Croci","year":"2022","unstructured":"Croci M, Rosilho de Souza G (2022) Mixed-precision explicit stabilized Runge-Kutta methods for single- and multi-scale differential equations. J Comput Phys 464:111349. https:\/\/doi.org\/10.1016\/j.jcp.2022.111349","journal-title":"J Comput Phys"},{"key":"8264_CR118","doi-asserted-by":"publisher","unstructured":"Balos CJ, Roberts S, Gardner DJ (2023) Leveraging mixed precision in exponential time integration methods. In: 2023 IEEE High Performance Extreme Computing Conference (HPEC), pp 1\u20138. https:\/\/doi.org\/10.1109\/HPEC58863.2023.10363489","DOI":"10.1109\/HPEC58863.2023.10363489"},{"key":"8264_CR119","doi-asserted-by":"publisher","unstructured":"Shin W, Schulz KW, Lorenzon AF, Maiterth M, Villasenor Alvarez B, Polo J, Kashi A, Lu H, Koukpaizan N, Georgiadou A, Norman M, Elwasif W, Matheson M, Wang F, Frontiere N, Oral S, Beck T, Messer B (2025) Bridging the gap: User-centric energy monitoring for policy-driven application optimization in HPC data centers. In: Proceedings of the SC \u201925 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC Workshops \u201925, pp 2007\u20132016. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3731599.3767564","DOI":"10.1145\/3731599.3767564"},{"key":"8264_CR120","unstructured":"Chakravarty A (2024) Deep learning models in speech recognition: measuring gpu energy consumption, impact of noise and model quantization for edge deployment. https:\/\/arxiv.org\/abs\/2405.01004"},{"key":"8264_CR121","doi-asserted-by":"crossref","unstructured":"Kermani A, Zeraatkar E, Irani H (2025) Energy-efficient transformer inference: optimization strategies for time series classification. https:\/\/arxiv.org\/abs\/2502.16627","DOI":"10.5120\/ijca2025924771"},{"key":"8264_CR122","doi-asserted-by":"publisher","unstructured":"Haidar A, Abdelfattah A, Zounon M, Wu P, Pranesh S, Tomov S, Dongarra J (2018) The design of fast and energy-efficient linear solvers: on the potential of half-precision arithmetic and iterative refinement techniques. In: Shi Y, Fu H, Tian Y, Krzhizhanovskaya VV, Lees MH, Dongarra J, Sloot PMA (eds) Computational Science\u2014ICCS 2018. Springer, pp 586\u2013600. https:\/\/doi.org\/10.1007\/978-3-319-93698-7_45","DOI":"10.1007\/978-3-319-93698-7_45"},{"issue":"3","key":"8264_CR123","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1007\/s00450-010-0124-2","volume":"25","author":"H Anzt","year":"2010","unstructured":"Anzt H, Rocker B, Heuveline V (2010) Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms. Comput Sci Res Dev 25(3):141. https:\/\/doi.org\/10.1007\/s00450-010-0124-2","journal-title":"Comput Sci Res Dev"},{"key":"8264_CR124","doi-asserted-by":"crossref","unstructured":"Kulkarni K, Kemmler S, Schwarz A, Gedik G, Chen Y, Papageorgiou D, Kavroulakis I, Iakymchuk R (2025) Harvesting energy consumption on European HPC systems: sharing experience from the CEEC project. https:\/\/arxiv.org\/abs\/2511.03029","DOI":"10.1145\/3784828.3785161"},{"issue":"1","key":"8264_CR125","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1145\/3480935","volume":"48","author":"H Anzt","year":"2022","unstructured":"Anzt H, Cojean T, Flegar G, G\u00f6bel F, Gr\u00fctzmacher T, Nayak P, Ribizel T, Tsai YM, Quintana-Ort\u00ed ES (2022) Ginkgo: a modern linear operator algebra framework for high performance computing. ACM Trans Math Softw 48(1):2\u20131233. https:\/\/doi.org\/10.1145\/3480935","journal-title":"ACM Trans Math Softw"},{"key":"8264_CR126","doi-asserted-by":"publisher","DOI":"10.1145\/3779120","author":"S Heldens","year":"2025","unstructured":"Heldens S, Werkhoven B (2025) Kernel Float: Unlocking mixed-precision GPU programming. ACM Trans Math Softw. https:\/\/doi.org\/10.1145\/3779120","journal-title":"ACM Trans Math Softw"},{"issue":"5\u20136","key":"8264_CR127","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1016\/j.parco.2009.12.005","volume":"36","author":"S Tomov","year":"2010","unstructured":"Tomov S, Dongarra J, Baboulin M (2010) Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput 36(5\u20136):232\u2013240. https:\/\/doi.org\/10.1016\/j.parco.2009.12.005","journal-title":"Parallel Comput"},{"issue":"12","key":"8264_CR128","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1016\/j.parco.2010.06.001","volume":"36","author":"S Tomov","year":"2010","unstructured":"Tomov S, Nath R, Dongarra J (2010) Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput 36(12):645\u2013654. https:\/\/doi.org\/10.1016\/j.parco.2010.06.001","journal-title":"Parallel Comput"},{"key":"8264_CR129","doi-asserted-by":"publisher","unstructured":"Chow E, Anzt H, Dongarra J (2015) Asynchronous iterative algorithm for computing incomplete factorizations on GPUs. In: Kunkel JM, Ludwig T (eds) High performance computing. Springer, Cham, pp 1\u201316. https:\/\/doi.org\/10.1007\/978-3-319-20119-1_1","DOI":"10.1007\/978-3-319-20119-1_1"},{"key":"8264_CR130","doi-asserted-by":"publisher","unstructured":"Haidar A, Wu P, Tomov S, Dongarra J (2017) Investigating half precision arithmetic to accelerate dense linear system solvers. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-scale Systems. ScalA \u201917. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3148226.3148237","DOI":"10.1145\/3148226.3148237"},{"key":"8264_CR131","doi-asserted-by":"publisher","unstructured":"Gates M, Kurzak J, Charara A, YarKhan A, Dongarra J (2019) Slate: design of a modern distributed and accelerated linear algebra library. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC \u201919. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3295500.3356223","DOI":"10.1145\/3295500.3356223"},{"key":"8264_CR132","doi-asserted-by":"publisher","DOI":"10.1177\/10943420241286531","author":"M Gates","year":"2024","unstructured":"Gates M, Abdelfattah A, Akbudak K, Farhan MA, Alomairy R, Bielich D, Burgess T, Cayrols S, Lindquist N, Sukkari D, YarKhan A (2024) Evolution of the slate linear algebra library. Int J High Perform Comput Appl. https:\/\/doi.org\/10.1177\/10943420241286531","journal-title":"Int J High Perform Comput Appl"},{"issue":"3","key":"8264_CR133","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1515\/jnma-2023-0089","volume":"31","author":"D Arndt","year":"2023","unstructured":"Arndt D, Bangerth W, Bergbauer M, Feder M, Fehling M, Heinz J, Heister T, Heltai L, Kronbichler M, Maier M, Munch P, Pelteret J-P, Turcksin B, Wells D, Zampini S (2023) The deal.II library, version 9.5. J Numer Math 31(3):231\u2013246. https:\/\/doi.org\/10.1515\/jnma-2023-0089","journal-title":"J Numer Math"},{"key":"8264_CR134","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1016\/j.camwa.2020.06.009","volume":"81","author":"R Anderson","year":"2021","unstructured":"Anderson R, Andrej J, Barker A, Bramwell J, Camier J-S, Cerveny J, Dobrev V, Dudouit Y, Fisher A, Kolev T, Pazner W, Stowell M, Tomov V, Akkerman I, Dahm J, Medina D, Zampini S (2021) MFEM: a modular finite element methods library. Comput Math Appl 81:42\u201374. https:\/\/doi.org\/10.1016\/j.camwa.2020.06.009","journal-title":"Comput Math Appl"},{"key":"8264_CR135","doi-asserted-by":"publisher","unstructured":"Gardner DJ, Reynolds DR, Woodward CS, Balos CJ (2022) Enabling new flexibility in the SUNDIALS suite of nonlinear and differential\/algebraic equation solvers. ACM Trans Math Softw (TOMS). https:\/\/doi.org\/10.1145\/3539801","DOI":"10.1145\/3539801"},{"issue":"5","key":"8264_CR136","doi-asserted-by":"publisher","DOI":"10.1063\/1.4983320","volume":"24","author":"R Hager","year":"2017","unstructured":"Hager R, Lang J, Chang CS, Ku S, Chen Y, Parker SE, Adams MF (2017) Verification of long wavelength electromagnetic modes with a gyrokinetic-fluid hybrid model in the XGC code. Phys Plasmas 24(5):054508. https:\/\/doi.org\/10.1063\/1.4983320","journal-title":"Phys Plasmas"},{"key":"8264_CR137","unstructured":"Team T (2020) The Trilinos Project Website. Accessed 01 Sept 2025. https:\/\/trilinos.github.io"},{"issue":"3","key":"8264_CR138","doi-asserted-by":"publisher","DOI":"10.3233\/SPR-2012-0352","volume":"20","author":"E Bavier","year":"2012","unstructured":"Bavier E, Hoemmen M, Rajamanickam S, Thornquist H (2012) Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems. Sci Program 20(3):243875. https:\/\/doi.org\/10.3233\/SPR-2012-0352","journal-title":"Sci Program"},{"issue":"2","key":"8264_CR139","doi-asserted-by":"publisher","DOI":"10.3233\/SPR-2012-0349","volume":"20","author":"CG Baker","year":"2012","unstructured":"Baker CG, Heroux MA (2012) Tpetra, and the use of generic programming in scientific computing. Sci Program 20(2):693861. https:\/\/doi.org\/10.3233\/SPR-2012-0349","journal-title":"Sci Program"},{"key":"8264_CR140","unstructured":"Rajamanickam S, Acer S, Berger-Vergiat L, Dang V, Ellingwood N, Harvey E, Kelley B, Trott CR, Wilke J, Yamazaki I (2021) Kokkos Kernels: performance portable sparse\/dense linear algebra and graph kernels"},{"issue":"4","key":"8264_CR141","doi-asserted-by":"publisher","first-page":"805","DOI":"10.1109\/TPDS.2021.3097283","volume":"33","author":"CR Trott","year":"2022","unstructured":"Trott CR, Lebrun-Grandi\u00e9 D, Arndt D, Ciesko J, Dang V, Ellingwood N, Gayatri R, Harvey E, Hollman DS, Ibanez D, Liber N, Madsen J, Miles J, Poliakoff D, Powell A, Rajamanickam S, Simberg M, Sunderland D, Turcksin B, Wilke J (2022) Kokkos 3: Programming model extensions for the exascale era. IEEE Trans Parallel Distrib Syst 33(4):805\u2013817. https:\/\/doi.org\/10.1109\/TPDS.2021.3097283","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"5","key":"8264_CR142","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1109\/MCSE.2021.3098509","volume":"23","author":"C Trott","year":"2021","unstructured":"Trott C, Berger-Vergiat L, Poliakoff D, Rajamanickam S, Lebrun-Grandie D, Madsen J, Al Awar N, Gligoric M, Shipman G, Womeldorff G (2021) The kokkos ecosystem: Comprehensive performance portability for high performance computing. Comput Sci Eng 23(5):10\u201318. https:\/\/doi.org\/10.1109\/MCSE.2021.3098509","journal-title":"Comput Sci Eng"},{"key":"8264_CR143","doi-asserted-by":"publisher","unstructured":"Yamazaki I, Carson E, Kelley B (2022) Mixed precision $$s$$-step conjugate gradient with residual replacement on GPUs. In: 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 886\u2013896. https:\/\/doi.org\/10.1109\/IPDPS53621.2022.00091","DOI":"10.1109\/IPDPS53621.2022.00091"},{"key":"8264_CR144","unstructured":"Berger-Vergiat L, Glusa CA, Harper G, Hu JJ, Mayr M, Ohm P, Prokopenko A, Siefert CM, Tuminaro RS, Wiesner TA (2023) MueLu user\u2019s guide. In: Technical Report SAND2023-12265, Sandia National Laboratories"},{"key":"8264_CR145","doi-asserted-by":"crossref","unstructured":"Balay S, Gropp WD, McInnes LC, Smith BF (1997) Efficienct management of parallelism in object oriented numerical software libraries. In: Arge E, Bruaset AM, Langtangen HP (eds) Modern software tools in scientific computing. Birkhauser Press, pp 163\u2013202","DOI":"10.1007\/978-1-4612-1986-6_8"},{"key":"8264_CR146","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2021.102831","volume":"108","author":"RT Mills","year":"2021","unstructured":"Mills RT, Adams MF, Balay S, Brown J, Dener A, Knepley M, Kruger SE, Morgan H, Munson T, Rupp K, Smith BF, Zampini S, Zhang H, Zhang J (2021) Toward performance-portable PETSc for GPU-based exascale systems. Parallel Comput 108:102831. https:\/\/doi.org\/10.1016\/j.parco.2021.102831","journal-title":"Parallel Comput"},{"issue":"3","key":"8264_CR147","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1145\/1089014.1089019","volume":"31","author":"V Hernandez","year":"2005","unstructured":"Hernandez V, Roman JE, Vidal V (2005) SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans Math Software 31(3):351\u2013362","journal-title":"ACM Trans Math Software"},{"issue":"3","key":"8264_CR148","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1145\/3603373","volume":"49","author":"JE Roman","year":"2023","unstructured":"Roman JE, Alvarruiz F, Campos C, Dalcin L, Jolivet P, Lamas Davi\u00f1a A (2023) Improvements to SLEPc in releases 3.14\u20133.18. ACM Trans Math Software 49(3):29\u201312911","journal-title":"ACM Trans Math Software"},{"key":"8264_CR149","doi-asserted-by":"crossref","unstructured":"Falgout RD, Yang UM (2002) hypre: a library of high performance preconditioners. In: Sloot PMA, Tan CJK, Dongarra JJ, Hoekstra AG (eds) Lecture notes in computer science, vol 2331. Springer, pp 632\u2013641. UCRL-JC-146175","DOI":"10.1007\/3-540-47789-6_66"},{"issue":"3","key":"8264_CR150","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1145\/1089014.1089020","volume":"31","author":"AC Hindmarsh","year":"2005","unstructured":"Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS (2005) SUNDIALS: Suite of nonlinear and differential\/algebraic equation solvers. ACM Trans Math Softw (TOMS) 31(3):363\u2013396. https:\/\/doi.org\/10.1145\/1089014.1089020","journal-title":"ACM Trans Math Softw (TOMS)"},{"key":"8264_CR151","doi-asserted-by":"publisher","unstructured":"Ayala A, Tomov S, Haidar A, Dongarra J (2020) heffte: Highly efficient fft for exascale. In: Computational Science\u2014ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3\u20135, 2020, Proceedings, Part I. Springer, Berlin, Heidelberg, pp 262\u2013275. https:\/\/doi.org\/10.1007\/978-3-030-50371-0_19","DOI":"10.1007\/978-3-030-50371-0_19"},{"key":"8264_CR152","unstructured":"Cayrols S, Li J, Bosilca G, Tomov S, Ayala A, Dongarra J (2022) Mixed precision and approximate 3D FFTs: speed for accuracy trade-off with GPU-aware MPI and run-time data compression. In: ICL technical report ICL-UT-22-04, Innovative Computing Laboratory, University of Tennessee, Knoxville (2022\u201305 2022)"},{"key":"8264_CR153","unstructured":"Li B (2021) tcFFT. Accessed 30 July 2025. https:\/\/github.com\/rox906\/tcFFT"},{"key":"8264_CR154","doi-asserted-by":"publisher","unstructured":"Hoerold F, Ivanov IR, Dhruv A, Moses WS, Dubey A, Wahib M, Domke J (2025) Raptor: practical numerical profiling of scientific applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC \u201925, pp 661\u2013680. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3712285.3759810","DOI":"10.1145\/3712285.3759810"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-026-08264-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-026-08264-4","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-026-08264-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T01:39:54Z","timestamp":1774316394000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-026-08264-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,24]]},"references-count":154,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2026,4]]}},"alternative-id":["8264"],"URL":"https:\/\/doi.org\/10.1007\/s11227-026-08264-4","relation":{},"ISSN":["1573-0484"],"issn-type":[{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,24]]},"assertion":[{"value":"8 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 January 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 March 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"287"}}