{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T15:55:13Z","timestamp":1780674913752,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3472493","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:39:57Z","timestamp":1633459197000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme"],"prefix":"10.1145","author":[{"given":"Daichi","family":"Mukunoki","sequence":"first","affiliation":[{"name":"RIKEN Center for Computational Science, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Katsuhisa","family":"Ozaki","sequence":"additional","affiliation":[{"name":"Shibaura Institute of Technology, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Takeshi","family":"Ogita","sequence":"additional","affiliation":[{"name":"Tokyo Woman''s Christian University, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Toshiyuki","family":"Imamura","sequence":"additional","affiliation":[{"name":"RIKEN Center for Computational Science, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"E. Anderson Z. Bai C. Bischof S. Blackford J. Demmel J. Dongarra J. Du\u00a0Croz A. Greenbaum S. Hammarling A. McKenney and D. Sorensen. 1999. LAPACK Users\u2019 Guide(third ed.). Society for Industrial and Applied Mathematics.  E. Anderson Z. Bai C. Bischof S. Blackford J. Demmel J. Dongarra J. Du\u00a0Croz A. Greenbaum S. Hammarling A. McKenney and D. Sorensen. 1999. LAPACK Users\u2019 Guide(third ed.). Society for Industrial and Applied Mathematics.","DOI":"10.1137\/1.9780898719604"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.amc.2012.03.087"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2018.2855729"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-60902-4_13"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01397083"},{"key":"e_1_3_2_1_8_1","volume-title":"The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems. In International Conference on Computational Science (ICCS 2017","author":"Dongarra J.","year":"2017","unstructured":"J. Dongarra , S. Hammarling , N.\u00a0 J. Higham , S.\u00a0 D. Relton , P. Valero-Lara , and M. Zounon . 2017 . The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems. In International Conference on Computational Science (ICCS 2017 ), Vol.\u00a0108. 495\u2013504. https:\/\/doi.org\/10.1016\/j.procs. 2017 .05.138 J. Dongarra, S. Hammarling, N.\u00a0J. Higham, S.\u00a0D. Relton, P. Valero-Lara, and M. Zounon. 2017. The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems. In International Conference on Computational Science (ICCS 2017), Vol.\u00a0108. 495\u2013504. https:\/\/doi.org\/10.1016\/j.procs.2017.05.138"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1236463.1236468"},{"key":"e_1_3_2_1_11_1","unstructured":"J. Hauser. 2018. Berkeley SoftFloat. http:\/\/www.jhauser.us\/arithmetic\/SoftFloat.html.  J. Hauser. 2018. Berkeley SoftFloat. http:\/\/www.jhauser.us\/arithmetic\/SoftFloat.html."},{"key":"e_1_3_2_1_12_1","unstructured":"Y. Hida X.S. Li and D.H. Bailey. 2000. Quad-Double Arithmetic: Algorithms Implementation and Application. Technical Report Technical Report LBNL-46996. Lawrence Berkeley National Laboratory.  Y. Hida X.S. Li and D.H. Bailey. 2000. Quad-Double Arithmetic: Algorithms Implementation and Application. Technical Report Technical Report LBNL-46996. Lawrence Berkeley National Laboratory."},{"key":"e_1_3_2_1_14_1","volume-title":"International Conference on Computational & Experimental Engineering and Sciences (ICCES","author":"Hishinuma T.","year":"2019","unstructured":"T. Hishinuma and M. Nakata . 2019. pzqd: PEZY-SC2 Acceleration of Double-Double Precision Arithmetic Library for High-Precision BLAS . In International Conference on Computational & Experimental Engineering and Sciences (ICCES 2019 ), Mechanisms and Machine Science, Vol.\u00a075. 717\u2013736. https:\/\/doi.org\/10.1007\/978-3-030-27053-7_61 T. Hishinuma and M. Nakata. 2019. pzqd: PEZY-SC2 Acceleration of Double-Double Precision Arithmetic Library for High-Precision BLAS. In International Conference on Computational & Experimental Engineering and Sciences (ICCES 2019), Mechanisms and Machine Science, Vol.\u00a075. 717\u2013736. https:\/\/doi.org\/10.1007\/978-3-030-27053-7_61"},{"key":"e_1_3_2_1_15_1","unstructured":"R. Iakymchuk S. Collange D. Defour and S. Graillat. 2015. ExBLAS: Reproducible and Accurate BLAS Library. In Numerical Reproducibility at Exascale (NRE2015) at SC\u201915.  R. Iakymchuk S. Collange D. Defour and S. Graillat. 2015. ExBLAS: Reproducible and Accurate BLAS Library. In Numerical Reproducibility at Exascale (NRE2015) at SC\u201915."},{"key":"e_1_3_2_1_16_1","volume-title":"Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications. In 32nd IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2018","author":"Ichimura S.","year":"2018","unstructured":"S. Ichimura , T. Katagiri , K. Ozaki , T. Ogita , and T. Nagai . 2018 . Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications. In 32nd IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2018 ). 1093\u20131102. https:\/\/doi.org\/10.1109\/IPDPSW. 2018 .00168 S. Ichimura, T. Katagiri, K. Ozaki, T. Ogita, and T. Nagai. 2018. Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications. In 32nd IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2018). 1093\u20131102. https:\/\/doi.org\/10.1109\/IPDPSW.2018.00168"},{"key":"e_1_3_2_1_17_1","volume-title":"IEEE Standard for Floating-Point Arithmetic","author":"IEEE Computer Society","year":"2008","unstructured":"IEEE Computer Society . 2008. IEEE Standard for Floating-Point Arithmetic . IEEE Std 754- 2008 (2008), 1\u201358. IEEE Computer Society. 2008. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2008 (2008), 1\u201358."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2982365"},{"key":"e_1_3_2_1_19_1","volume-title":"CAMPARY: Cuda Multiple Precision Arithmetic Library and Applications. In The 5th International Congress on Mathematical Software (ICMS","author":"Joldes M.","year":"2016","unstructured":"M. Joldes , J.-M. Muller , V. Popescu , and W. Tucker . 2016 . CAMPARY: Cuda Multiple Precision Arithmetic Library and Applications. In The 5th International Congress on Mathematical Software (ICMS 2016 ), Lecture Notes in Computer Science, Vol.\u00a09725. 232\u2013240. https:\/\/doi.org\/10.1007\/978-3-319-42432-3_29 M. Joldes, J.-M. Muller, V. Popescu, and W. Tucker. 2016. CAMPARY: Cuda Multiple Precision Arithmetic Library and Applications. In The 5th International Congress on Mathematical Software (ICMS 2016), Lecture Notes in Computer Science, Vol.\u00a09725. 232\u2013240. https:\/\/doi.org\/10.1007\/978-3-319-42432-3_29"},{"key":"e_1_3_2_1_20_1","volume-title":"The Art of Computer Programming","author":"Knuth E.","unstructured":"D.\u00a0 E. Knuth . 1969. The Art of Computer Programming Vol. 2 Seminumerical Algorithms. Addison-Wesley . D.\u00a0E. Knuth. 1969. The Art of Computer Programming Vol.2 Seminumerical Algorithms. Addison-Wesley."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567808"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMAG.2019.2951280"},{"key":"e_1_3_2_1_23_1","volume-title":"The 35th JSST Annual Conference International Conference on Simulation Technology.","author":"Minamihata A.","unstructured":"A. Minamihata , K. Ozaki , T. Ogita , and S. Oishi . 2016. Improved extraction scheme for accurate floating-point summation . In The 35th JSST Annual Conference International Conference on Simulation Technology. A. Minamihata, K. Ozaki, T. Ogita, and S. Oishi. 2016. Improved extraction scheme for accurate floating-point summation. In The 35th JSST Annual Conference International Conference on Simulation Technology."},{"key":"e_1_3_2_1_24_1","volume-title":"Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-core Architectures. In 13th International Conference on Parallel Processing and Applied Mathematics (PPAM 2019","author":"Mukunoki D.","year":"2043","unstructured":"D. Mukunoki , T. Ogita , and K. Ozaki . 2020 . Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-core Architectures. In 13th International Conference on Parallel Processing and Applied Mathematics (PPAM 2019 ), Lecture Notes in Computer Science, Vol.\u00a01 2043 . 516\u2013527. https:\/\/doi.org\/10.1007\/978-3-030-43229-4_44 D. Mukunoki, T. Ogita, and K. Ozaki. 2020. Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-core Architectures. In 13th International Conference on Parallel Processing and Applied Mathematics (PPAM 2019), Lecture Notes in Computer Science, Vol.\u00a012043. 516\u2013527. https:\/\/doi.org\/10.1007\/978-3-030-43229-4_44"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432261.3432270"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"D.\n      Mukunoki K.\n      Ozaki T.\n      Ogita and \n      T.\n      Imamura\n  . \n  2020\n  . DGEMM using Tensor Cores and Its Accurate and Reproducible Versions. In ISC High Performance 2020 Lecture Notes in Computer Science Vol\n  .\u00a012151. 230\u2013248. https:\/\/doi.org\/10.1007\/978-3-030-50743-5_12  D. Mukunoki K. Ozaki T. Ogita and T. Imamura. 2020. DGEMM using Tensor Cores and Its Accurate and Reproducible Versions. In ISC High Performance 2020 Lecture Notes in Computer Science Vol.\u00a012151. 230\u2013248. https:\/\/doi.org\/10.1007\/978-3-030-50743-5_12","DOI":"10.1007\/978-3-030-50743-5_12"},{"key":"e_1_3_2_1_27_1","volume-title":"Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW 2012","author":"Mukunoki D.","year":"2012","unstructured":"D. Mukunoki and D. Takahashi . 2012 . Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW 2012 ). 1378\u20131386. https:\/\/doi.org\/10.1109\/IPDPSW. 2012 .175 D. Mukunoki and D. Takahashi. 2012. Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW 2012). 1378\u20131386. https:\/\/doi.org\/10.1109\/IPDPSW.2012.175"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964218.1964227"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC.2012.31"},{"key":"e_1_3_2_1_30_1","volume-title":"High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL. In 2018 International Conference on Field-Programmable Technology (FPT\u201918)","author":"Nakasato N.","year":"2018","unstructured":"N. Nakasato , H. Daisaka , and T. Ishikawa . 2018 . High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL. In 2018 International Conference on Field-Programmable Technology (FPT\u201918) . 262\u2013265. https:\/\/doi.org\/10.1109\/FPT. 2018 .00049 N. Nakasato, H. Daisaka, and T. Ishikawa. 2018. High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL. In 2018 International Conference on Field-Programmable Technology (FPT\u201918). 262\u2013265. https:\/\/doi.org\/10.1109\/FPT.2018.00049"},{"key":"e_1_3_2_1_31_1","volume-title":"The MPACK","author":"Nakata M.","unstructured":"M. Nakata . [n.d.]. The MPACK ; Multiple precision arithmetic BLAS (MBLAS) and LAPACK (MLAPACK) . http:\/\/mplapack.sourceforge.net. M. Nakata. [n.d.]. The MPACK; Multiple precision arithmetic BLAS (MBLAS) and LAPACK (MLAPACK). http:\/\/mplapack.sourceforge.net."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CACSD.2010.5612693"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1137\/030601818"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11075-011-9478-1"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ARITH.1991.145549"},{"key":"e_1_3_2_1_36_1","unstructured":"K. Tomonori. 2021. Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2. arxiv:2101.06584\u00a0[math.NA]  K. Tomonori. 2021. Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2. arxiv:2101.06584\u00a0[math.NA]"},{"key":"e_1_3_2_1_37_1","volume-title":"van\u00a0de Geijn and Jerrell Watts","author":"A.","year":"1995","unstructured":"Robert\u00a0 A. van\u00a0de Geijn and Jerrell Watts . 1995 . SUMMA : Scalable Universal Matrix Multiplication Algorithm. Technical Report. USA. Robert\u00a0A. van\u00a0de Geijn and Jerrell Watts. 1995. SUMMA: Scalable Universal Matrix Multiplication Algorithm. Technical Report. USA."}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","location":"Lemont IL USA","acronym":"ICPP 2021"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472493","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3472493","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:11Z","timestamp":1750193291000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472493"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":34,"alternative-id":["10.1145\/3472456.3472493","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3472493","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}