{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T15:53:37Z","timestamp":1780329217554,"version":"3.54.1"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,7,21]],"date-time":"2020-07-21T00:00:00Z","timestamp":1595289600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Darpa XDATA"},{"name":"HP"},{"DOI":"10.13039\/100004356","name":"Nokia","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100004356","id-type":"DOI","asserted-by":"crossref"}]},{"name":"DOE Computational Science Graduate Fellowship","award":["DE-FG02-97ER25308"],"award-info":[{"award-number":["DE-FG02-97ER25308"]}]},{"DOI":"10.13039\/100014600","name":"Mathworks","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100014600","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000185","name":"DARPA","doi-asserted-by":"crossref","award":["HR0011-12-2-0016"],"award-info":[{"award-number":["HR0011-12-2-0016"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"crossref"}]},{"name":"ASPIRE Lab"},{"name":"LGE"},{"name":"Samsung"},{"DOI":"10.13039\/100008297","name":"Cray","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100008297","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NSF","award":["NSF ACI-1339676, NSF DMS-1312831"],"award-info":[{"award-number":["NSF ACI-1339676, NSF DMS-1312831"]}]},{"name":"Intel"},{"name":"DOE","award":["DOE DE-SC0010200, DOE DE-SC0008699, DOE DE-SC0008700, and DOE AC02-05CH11231"],"award-info":[{"award-number":["DOE DE-SC0010200, DOE DE-SC0008699, DOE DE-SC0008700, and DOE AC02-05CH11231"]}]},{"name":"Intel ITSC"},{"DOI":"10.13039\/100006785","name":"Google","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100006785","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Huawei"},{"DOI":"10.13039\/100007065","name":"NVIDIA","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100007065","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Oracle"},{"DOI":"10.13039\/100016861","name":"Aramco","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100016861","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2020,9,30]]},"abstract":"<jats:p>\n            We define \u201creproducibility\u201d as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a \u201creproducible accumulator\u201d data structure (the \u201cbinned number\u201d) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing\n            <jats:italic>n<\/jats:italic>\n            words with a 6-word reproducible accumulator requires approximately 9\n            <jats:italic>n<\/jats:italic>\n            floating point operations (arithmetic, comparison, and absolute value) and approximately 3\n            <jats:italic>n<\/jats:italic>\n            bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 2\n            <jats:sup>29<\/jats:sup>\n            times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.\n          <\/jats:p>","DOI":"10.1145\/3389360","type":"journal-article","created":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T12:37:18Z","timestamp":1594125438000},"page":"1-49","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Algorithms for Efficient Reproducible Floating Point Summation"],"prefix":"10.1145","volume":"46","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4963-0869","authenticated-orcid":false,"given":"Willow","family":"Ahrens","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"James","family":"Demmel","sequence":"additional","affiliation":[{"name":"University of California Berkeley, Berkeley, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hong Diep","family":"Nguyen","sequence":"additional","affiliation":[{"name":"University of California Berkeley, Berkeley, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,7,21]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"IEEE standard for floating-point arithmetic","author":"EE.","year":"2008","unstructured":"IE EE. 2008. IEEE standard for floating-point arithmetic . IEEE Std 754- 2008 (Aug. 2008), 1--70. DOI:https:\/\/doi.org\/10.1109\/IEEESTD.2008.4610935 10.1109\/IEEESTD.2008.4610935 IEEE. 2008. IEEE standard for floating-point arithmetic. IEEE Std 754-2008 (Aug. 2008), 1--70. DOI:https:\/\/doi.org\/10.1109\/IEEESTD.2008.4610935"},{"key":"e_1_2_1_2_1","unstructured":"Intel. 2018. Developer Reference for Intel\u00ae Math Kernel Library 2018 - C | Intel\u00ae Software. Retrieved from https:\/\/software.intel.com\/en-us\/download\/developer-reference-for-intel-math-kernel-library-2018-c.  Intel. 2018. Developer Reference for Intel\u00ae Math Kernel Library 2018 - C | Intel\u00ae Software. Retrieved from https:\/\/software.intel.com\/en-us\/download\/developer-reference-for-intel-math-kernel-library-2018-c."},{"key":"e_1_2_1_3_1","unstructured":"NVIDIA. 2018. NVIDIA\u00ae cuBLAS. Retrieved from http:\/\/docs.nvidia.com\/cuda\/cublas\/index.html.  NVIDIA. 2018. NVIDIA\u00ae cuBLAS. Retrieved from http:\/\/docs.nvidia.com\/cuda\/cublas\/index.html."},{"key":"e_1_2_1_4_1","unstructured":"Intel. 2019. bfloat16 - HardwareNumerics Definition. Retrieved from https:\/\/software.intel.com\/sites\/default\/files\/managed\/40\/8b\/bf16-hardware-numerics-definition-white-paper.pdf.  Intel. 2019. bfloat16 - HardwareNumerics Definition. Retrieved from https:\/\/software.intel.com\/sites\/default\/files\/managed\/40\/8b\/bf16-hardware-numerics-definition-white-paper.pdf."},{"key":"e_1_2_1_5_1","volume-title":"IEEE standard for floating-point arithmetic","author":"EE.","year":"2019","unstructured":"IE EE. 2019. IEEE standard for floating-point arithmetic . IEEE Std 754- 2019 (July 2019), 1--84. DOI:https:\/\/doi.org\/10.1109\/IEEESTD.2019.8766229 10.1109\/IEEESTD.2019.8766229 IEEE. 2019. IEEE standard for floating-point arithmetic. IEEE Std 754-2019 (July 2019), 1--84. DOI:https:\/\/doi.org\/10.1109\/IEEESTD.2019.8766229"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ARITH.1991.145558"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201914)","author":"Arteaga A.","year":"2014","unstructured":"A. Arteaga , O. Fuhrer , and T. Hoefler . 2014. Designing bit-reproducible portable high-performance applications . In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201914) . 1235--1244. DOI:https:\/\/doi.org\/10.1109\/IPDPS. 2014 .127 10.1109\/IPDPS.2014.127 A. Arteaga, O. Fuhrer, and T. Hoefler. 2014. Designing bit-reproducible portable high-performance applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201914). 1235--1244. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2014.127"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN\u201915)","author":"Chohra C.","unstructured":"C. Chohra , P. Langlois , and D. Parello . 2015. Efficiency of reproducible level 1 BLAS . In Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN\u201915) . Springer, Cham, 99--108. DOI:https:\/\/doi.org\/10.1007\/978-3-319-31769-4_8 10.1007\/978-3-319-31769-4_8 C. Chohra, P. Langlois, and D. Parello. 2015. Efficiency of reproducible level 1 BLAS. In Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN\u201915). Springer, Cham, 99--108. DOI:https:\/\/doi.org\/10.1007\/978-3-319-31769-4_8"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the Euro-Par Parallel Processing Workshops. Springer, Cham, 609--620","author":"Chohra C.","unstructured":"C. Chohra , P. Langlois , and D. Parello . 2016. Reproducible, accurately rounded and efficient BLAS . In Proceedings of the Euro-Par Parallel Processing Workshops. Springer, Cham, 609--620 . DOI:https:\/\/doi.org\/10.1007\/978-3-319-58943-5_49 10.1007\/978-3-319-58943-5_49 C. Chohra, P. Langlois, and D. Parello. 2016. Reproducible, accurately rounded and efficient BLAS. In Proceedings of the Euro-Par Parallel Processing Workshops. Springer, Cham, 609--620. DOI:https:\/\/doi.org\/10.1007\/978-3-319-58943-5_49"},{"key":"#cr-split#-e_1_2_1_10_1.1","doi-asserted-by":"crossref","unstructured":"S. Collange D. Defour S. Graillat and R. Iakymchuk. 2015. Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Comput. 49 (Nov. 2015) 83--97. DOI:https:\/\/doi.org\/10.1016\/j.parco.2015.09.001 10.1016\/j.parco.2015.09.001","DOI":"10.1016\/j.parco.2015.09.001"},{"key":"#cr-split#-e_1_2_1_10_1.2","doi-asserted-by":"crossref","unstructured":"S. Collange D. Defour S. Graillat and R. Iakymchuk. 2015. Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Comput. 49 (Nov. 2015) 83--97. DOI:https:\/\/doi.org\/10.1016\/j.parco.2015.09.001","DOI":"10.1016\/j.parco.2015.09.001"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01397083"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the SC 2015 Birds of a Feather Sessions.","author":"Demmel J.","unstructured":"J. Demmel , G. Gopalakrishnan , M. Heroux , W. Keyrouz , and K. Sato . 2015. Reproducibility of high performance codes and simulations: Tools, techniques, debugging . In Proceedings of the SC 2015 Birds of a Feather Sessions. Retrieved from https:\/\/gcl.cis.udel.edu\/sc15bof.php. J. Demmel, G. Gopalakrishnan, M. Heroux, W. Keyrouz, and K. Sato. 2015. Reproducibility of high performance codes and simulations: Tools, techniques, debugging. In Proceedings of the SC 2015 Birds of a Feather Sessions. Retrieved from https:\/\/gcl.cis.udel.edu\/sc15bof.php."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827502407627"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the Symposium on Computer Arithmetic (ARITH\u201913)","author":"Demmel J.","year":"2013","unstructured":"J. Demmel and H. D. Nguyen . 2013. Fast reproducible floating-point summation . In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201913) . 163--172. DOI:https:\/\/doi.org\/10.1109\/ARITH. 2013 .9 10.1109\/ARITH.2013.9 J. Demmel and H. D. Nguyen. 2013. Fast reproducible floating-point summation. In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201913). 163--172. DOI:https:\/\/doi.org\/10.1109\/ARITH.2013.9"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2014.2345391"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the Conference on Programming Language Design and Implementation (PLDI\u201994)","author":"Granlund T.","unstructured":"T. Granlund and P. L. Montgomery . 1994. Division by invariant integers using multiplication . In Proceedings of the Conference on Programming Language Design and Implementation (PLDI\u201994) . 61--72. DOI:https:\/\/doi.org\/10.1145\/178243.178249 10.1145\/178243.178249 T. Granlund and P. L. Montgomery. 1994. Division by invariant integers using multiplication. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI\u201994). 61--72. DOI:https:\/\/doi.org\/10.1145\/178243.178249"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Symposium on Computer Arithmetic (ARITH\u201901)","author":"Hida Y.","year":"2001","unstructured":"Y. Hida , X. S. Li , and D. H. Bailey . 2001. Algorithms for quad-double precision floating point arithmetic . In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201901) . 155--162. DOI:https:\/\/doi.org\/10.1109\/ARITH. 2001 .930115 10.1109\/ARITH.2001.930115 Y. Hida, X. S. Li, and D. H. Bailey. 2001. Algorithms for quad-double precision floating point arithmetic. In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201901). 155--162. DOI:https:\/\/doi.org\/10.1109\/ARITH.2001.930115"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1137\/0914050"},{"key":"e_1_2_1_19_1","volume-title":"Accuracy and Stability of Numerical Algorithms","author":"Higham N.","unstructured":"N. Higham . 2002. Accuracy and Stability of Numerical Algorithms ( 2 nd ed.). Society for Industrial and Applied Mathematics . DOI:https:\/\/doi.org\/10.1137\/1.9780898718027 10.1137\/1.9780898718027 N. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics. DOI:https:\/\/doi.org\/10.1137\/1.9780898718027","edition":"2"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2019.2926614"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the SC 2015 Numerical Reproducibility at Exascale Workshops (NRE\u201915)","author":"Iakymchuk R.","year":"2023","unstructured":"R. Iakymchuk , S. Collange , D. Defour , and S. Graillat . 2015. ExBLAS: Reproducible and accurate BLAS library . In Proceedings of the SC 2015 Numerical Reproducibility at Exascale Workshops (NRE\u201915) . Retrieved from https:\/\/hal.archives-ouvertes.fr\/hal-01 2023 96. R. Iakymchuk, S. Collange, D. Defour, and S. Graillat. 2015. ExBLAS: Reproducible and accurate BLAS library. In Proceedings of the SC 2015 Numerical Reproducibility at Exascale Workshops (NRE\u201915). Retrieved from https:\/\/hal.archives-ouvertes.fr\/hal-01202396."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN\u201915)","author":"Iakymchuk R.","unstructured":"R. Iakymchuk , D. Defour , S. Collange , and S. Graillat . 2015. Reproducible and accurate matrix multiplication . In Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN\u201915) . Springer, Cham, 126--137. DOI:https:\/\/doi.org\/10.1007\/978-3-319-31769-4_11 10.1007\/978-3-319-31769-4_11 R. Iakymchuk, D. Defour, S. Collange, and S. Graillat. 2015. Reproducible and accurate matrix multiplication. In Proceedings of the Conference on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN\u201915). Springer, Cham, 126--137. DOI:https:\/\/doi.org\/10.1007\/978-3-319-31769-4_11"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Conference on Information Technology - New Generations (ITNG\u201915)","author":"Iakymchuk R.","year":"2015","unstructured":"R. Iakymchuk , D. Defour , S. Collange , and S. Graillat . 2015. Reproducible triangular solvers for high-performance computing . In Proceedings of the International Conference on Information Technology - New Generations (ITNG\u201915) . 353--358. DOI:https:\/\/doi.org\/10.1109\/ITNG. 2015 .63 10.1109\/ITNG.2015.63 R. Iakymchuk, D. Defour, S. Collange, and S. Graillat. 2015. Reproducible triangular solvers for high-performance computing. In Proceedings of the International Conference on Information Technology - New Generations (ITNG\u201915). 353--358. DOI:https:\/\/doi.org\/10.1109\/ITNG.2015.63"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/363707.363723"},{"key":"e_1_2_1_25_1","volume-title":"The Art of Computer Programming 2: Seminumerical Algorithms","author":"Knuth D. E.","unstructured":"D. E. Knuth . 1969. The Art of Computer Programming 2: Seminumerical Algorithms . Addison-Wesley, Reading , MA. D. E. Knuth. 1969. The Art of Computer Programming 2: Seminumerical Algorithms. Addison-Wesley, Reading, MA."},{"key":"e_1_2_1_26_1","volume-title":"Computer Arithmetic and Validity: Theory, Implementation, and Applications","author":"Kulisch U.","unstructured":"U. Kulisch . 2012. Computer Arithmetic and Validity: Theory, Implementation, and Applications ( 2 nd ed.). Walter de Gruyter . U. Kulisch. 2012. Computer Arithmetic and Validity: Theory, Implementation, and Applications (2nd ed.). Walter de Gruyter.","edition":"2"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the Symposium on Computer Arithmetic (ARITH\u201917)","author":"Lutz D. R.","year":"2017","unstructured":"D. R. Lutz and C. N. Hinds . 2017. High-precision anchored accumulators for reproducible floating-point summation . In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201917) . 98--105. DOI:https:\/\/doi.org\/10.1109\/ARITH. 2017 .20 10.1109\/ARITH.2017.20 D. R. Lutz and C. N. Hinds. 2017. High-precision anchored accumulators for reproducible floating-point summation. In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201917). 98--105. DOI:https:\/\/doi.org\/10.1109\/ARITH.2017.20"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"J.-M. Muller N. Brunie F. Dinechin C.-P. Jeannerod M. Joldes V. Lef\u00e8vre G. Melquiond N. Revol and S. Torres. 2018. Handbook of Floating-Point Arithmetic (2nd ed.). Birkh\u00e4user Basel. Retrieved from http:\/\/www.springer.com\/us\/book\/9783319765259.  J.-M. Muller N. Brunie F. Dinechin C.-P. Jeannerod M. Joldes V. Lef\u00e8vre G. Melquiond N. Revol and S. Torres. 2018. Handbook of Floating-Point Arithmetic (2nd ed.). Birkh\u00e4user Basel. Retrieved from http:\/\/www.springer.com\/us\/book\/9783319765259.","DOI":"10.1007\/978-3-319-76526-6"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the Symposium on Computer Arithmetic (ARITH\u201918)","author":"Riedy J.","year":"2018","unstructured":"J. Riedy and J. Demmel . 2018. Augmented arithmetic operations proposed for IEEE-754 2018 . In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201918) . 45--52. DOI:https:\/\/doi.org\/10.1109\/ARITH. 2018 .8464813 10.1109\/ARITH.2018.8464813 J. Riedy and J. Demmel. 2018. Augmented arithmetic operations proposed for IEEE-754 2018. In Proceedings of the Symposium on Computer Arithmetic (ARITH\u201918). 45--52. DOI:https:\/\/doi.org\/10.1109\/ARITH.2018.8464813"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1137\/080738490"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1587\/nolta.1.2"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3389360","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3389360","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3389360","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:31Z","timestamp":1750200091000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3389360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,21]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9,30]]}},"alternative-id":["10.1145\/3389360"],"URL":"https:\/\/doi.org\/10.1145\/3389360","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,21]]},"assertion":[{"value":"2016-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}