{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:20:53Z","timestamp":1750220453474,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,9,15]],"date-time":"2020-09-15T00:00:00Z","timestamp":1600128000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100020899","name":"ETH Board of the Swiss Federal Institutes of Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100020899","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001703","name":"\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001703","id-type":"DOI","asserted-by":"crossref"}]},{"name":"German Helmholtz Association"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2020,9,30]]},"abstract":"<jats:p>\n            The evaluation of small degree polynomials is critical for the computation of elementary functions. It has been extensively studied and is well documented. In this article, we evaluate existing methods for polynomial evaluation on superscalar architecture. In addition, we have completed this work with a factorization method, which is surprisingly neglected in the literature. This work focuses on out-of-order Intel processors, amongst others, of which computational units are available. Moreover, we applied our work on the elementary function\n            <jats:italic>e<\/jats:italic>\n            <jats:sup>\n              <jats:italic>x<\/jats:italic>\n            <\/jats:sup>\n            that requires, in the current implementation, an evaluation of a polynomial of degree 10 for a satisfying precision and performance. Our results show that the factorization scheme is the fastest in benchmarks, and that latency and throughput are intrinsically dependent on each other on superscalar architecture.\n          <\/jats:p>","DOI":"10.1145\/3408893","type":"journal-article","created":{"date-parts":[[2020,9,15]],"date-time":"2020-09-15T22:15:46Z","timestamp":1600208146000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Polynomial Evaluation on Superscalar Architecture, Applied to the Elementary Function\n            <i>e<\/i>\n            <sup>\n              <i>x<\/i>\n            <\/sup>"],"prefix":"10.1145","volume":"46","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3436-1766","authenticated-orcid":false,"given":"Timoth\u00e9e","family":"Ewart","sequence":"first","affiliation":[{"name":"Blue Brain Project, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Gen\u00e8ve, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Francesco","family":"Cremonesi","sequence":"additional","affiliation":[{"name":"Blue Brain Project, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Gen\u00e8ve, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Felix","family":"Sch\u00fcrmann","sequence":"additional","affiliation":[{"name":"Blue Brain Project, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Gen\u00e8ve, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fabien","family":"Delalondre","sequence":"additional","affiliation":[{"name":"Blue Brain Project, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Gen\u00e8ve, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,9,15]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/NORCHP.2011.6126735"},{"key":"e_1_2_1_2_1","volume-title":"Baker and Peter Graves-Morris","author":"George","year":"1996","unstructured":"George A. Baker and Peter Graves-Morris . 1996 . Pad\u00e9 Approximants (2nd ed.). Cambridge University Press . DOI:https:\/\/doi.org\/10.1017\/CBO9780511530074 10.1017\/CBO9780511530074 George A. Baker and Peter Graves-Morris. 1996. Pad\u00e9 Approximants (2nd ed.). Cambridge University Press. DOI:https:\/\/doi.org\/10.1017\/CBO9780511530074"},{"key":"e_1_2_1_3_1","first-page":"1","article-title":"Autotuning in high-performance computing applications","volume":"99","author":"Balaprakash Prasanna","year":"2018","unstructured":"Prasanna Balaprakash , Jack Dongarra , Todd Gamblin , Mary Hall , Jeffrey K. Hollingsworth , Boyana Norris , and Richard Vuduc . 2018 . Autotuning in high-performance computing applications . Proc. IEEE 99 (2018), 1 -- 16 . Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in high-performance computing applications. Proc. IEEE 99 (2018), 1--16.","journal-title":"Proc. IEEE"},{"volume-title":"Proceedings of the 20th IEEE Symposium on Computer Arithmetic (ARITH\u201911)","author":"Boersma M.","key":"e_1_2_1_4_1","unstructured":"M. Boersma , M. Kroner , C. Layer , P. Leber , S. M. Muller , and K. Schelm . 2011. The POWER7 binary floating-point unit . In Proceedings of the 20th IEEE Symposium on Computer Arithmetic (ARITH\u201911) . 87--91. M. Boersma, M. Kroner, C. Layer, P. Leber, S. M. Muller, and K. Schelm. 2011. The POWER7 binary floating-point unit. In Proceedings of the 20th IEEE Symposium on Computer Arithmetic (ARITH\u201911). 87--91."},{"key":"e_1_2_1_5_1","unstructured":"T. Agerwala and J. Cocke. 1987. High Performance Reduced Instruction Set Processors. IBM Watson Research Center.  T. Agerwala and J. Cocke. 1987. High Performance Reduced Instruction Set Processors. IBM Watson Research Center."},{"key":"e_1_2_1_6_1","volume-title":"Sollya: An environment for the development of numerical codes. In Mathematical Software - ICMS 2010 (Lecture Notes in Computer Science)","author":"Chevillard S.","year":"2010","unstructured":"S. Chevillard , M. Jolde\u015f , and C. Lauter . 2010 . Sollya: An environment for the development of numerical codes. In Mathematical Software - ICMS 2010 (Lecture Notes in Computer Science) , K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama (Eds.), Vol. 6327 . Springer ,Germany, 28--31. S. Chevillard, M. Jolde\u015f, and C. Lauter. 2010. Sollya: An environment for the development of numerical codes. In Mathematical Software - ICMS 2010 (Lecture Notes in Computer Science), K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama (Eds.), Vol. 6327. Springer,Germany, 28--31."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2017.2703870"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.62.0239"},{"volume-title":"Proceedings of the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM\u201913)","author":"Dukhan Marat","key":"e_1_2_1_9_1","unstructured":"Marat Dukhan and Richard W. Vuduc . 2013. Methods for high-throughput computation of elementary functions . In Proceedings of the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM\u201913) , Revised Selected Papers, Part I. 86--95. Marat Dukhan and Richard W. Vuduc. 2013. Methods for high-throughput computation of elementary functions. In Proceedings of the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM\u201913), Revised Selected Papers, Part I. 86--95."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1977.1674900"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the International Workshop on Managing Requirements Knowledge. 33","author":"Estrin Gerald","year":"1960","unstructured":"Gerald Estrin . 1960 . Organization of computer systems\u2014The fixed plus variable structure computer . In Proceedings of the International Workshop on Managing Requirements Knowledge. 33 . Gerald Estrin. 1960. Organization of computer systems\u2014The fixed plus variable structure computer. In Proceedings of the International Workshop on Managing Requirements Knowledge. 33."},{"key":"e_1_2_1_12_1","volume-title":"Cyme: A library maximizing SIMD computation on user-defined containers","author":"Ewart Timoth\u00e9e","year":"2014","unstructured":"Timoth\u00e9e Ewart , Fabien Delalondre , and Felix Sch\u00fcrmann . 2014 . Cyme: A library maximizing SIMD computation on user-defined containers . In Supercomputing, Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer (Eds.). Lecture Notes in Computer Science, Vol. 8488 . Springer International Publishing , 440--449. Timoth\u00e9e Ewart, Fabien Delalondre, and Felix Sch\u00fcrmann. 2014. Cyme: A library maximizing SIMD computation on user-defined containers. In Supercomputing, Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer (Eds.). Lecture Notes in Computer Science, Vol. 8488. Springer International Publishing, 440--449."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2832087.2832088"},{"volume-title":"Code generation: Evaluating polynomials","author":"Fateman Richard J.","key":"e_1_2_1_14_1","unstructured":"Richard J. Fateman . 2002. Code generation: Evaluating polynomials . University of California , Berkeley. Retrieved from http:\/\/people.eecs.berkeley.edu\/~fateman\/papers\/polyval.pdf. Richard J. Fateman. 2002. Code generation: Evaluating polynomials. University of California, Berkeley. Retrieved from http:\/\/people.eecs.berkeley.edu\/~fateman\/papers\/polyval.pdf."},{"key":"e_1_2_1_15_1","unstructured":"Agner Fog. 1996-2016. The microarchitecture of Intel AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers. Retrieved from http:\/\/www.agner.org\/optimize\/microarchitecture.pdf.  Agner Fog. 1996-2016. The microarchitecture of Intel AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers. Retrieved from http:\/\/www.agner.org\/optimize\/microarchitecture.pdf."},{"key":"e_1_2_1_16_1","unstructured":"Agner Fog. 2018. Instruction tables. Retrieved from http:\/\/www.agner.org\/optimize\/instruction_tables.pdf.  Agner Fog. 2018. Instruction tables. Retrieved from http:\/\/www.agner.org\/optimize\/instruction_tables.pdf."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/321281.321282"},{"key":"e_1_2_1_18_1","volume-title":"Wheatley","author":"Gerald Curtis F.","year":"2004","unstructured":"Curtis F. Gerald and Patrick O . Wheatley . 2004 . Applied Numerical Analysis. Pearson\/Addison-Wesley . Curtis F. Gerald and Patrick O. Wheatley. 2004. Applied Numerical Analysis. Pearson\/Addison-Wesley."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/103162.103163"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356052.1356053"},{"key":"e_1_2_1_21_1","unstructured":"HiPEAC 2015. Fast Exponential Computation on SIMD Architectures. HiPEAC.  HiPEAC 2015. Fast Exponential Computation on SIMD Architectures. HiPEAC."},{"key":"e_1_2_1_22_1","unstructured":"Intel. 2009--2012. Intel Architecture Code Analyser. Retrieved from https:\/\/software.intel.com\/en-us\/articles\/intel-architecture-code-analyzer.  Intel. 2009--2012. Intel Architecture Code Analyser. Retrieved from https:\/\/software.intel.com\/en-us\/articles\/intel-architecture-code-analyzer."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3121432"},{"key":"e_1_2_1_24_1","unstructured":"W. Kahan. 2002. On the Cost of Floating-point Computation without Extra-precise Arithmetic. Retrieved from https:\/\/people.eecs.berkeley.edu\/ wkahan\/Qdrtcs.pdf.  W. Kahan. 2002. On the Cost of Floating-point Computation without Extra-precise Arithmetic. Retrieved from https:\/\/people.eecs.berkeley.edu\/ wkahan\/Qdrtcs.pdf."},{"volume-title":"Elementary Mathematics from an Advanced Standpoint","author":"Klein Felix","key":"e_1_2_1_25_1","unstructured":"Felix Klein . 1932. Elementary Mathematics from an Advanced Standpoint . MacMillan and Co. Limited . Felix Klein. 1932. Elementary Mathematics from an Advanced Standpoint. MacMillan and Co. Limited."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/355580.369074"},{"key":"e_1_2_1_27_1","volume-title":"The Art of Computer Programming","author":"Knuth Donald E.","unstructured":"Donald E. Knuth . 1997. The Art of Computer Programming , Volume 2 ( 3 rd ed.): Seminumerical Algorithms. Addison-Wesley Longman Publishing Co. , Inc., Boston, MA. Donald E. Knuth. 1997. The Art of Computer Programming, Volume 2 (3rd ed.): Seminumerical Algorithms. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.","edition":"3"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.cs.04.060190.001133"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACSSC.2016.7869070"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-0000(78)90041-7"},{"key":"e_1_2_1_32_1","unstructured":"Sparsh Mittal. 2018. A Survey of Techniques for Dynamic Branch. Retrieved from https:\/\/arxiv.org\/abs\/1804.00261.  Sparsh Mittal. 2018. A Survey of Techniques for Dynamic Branch. Retrieved from https:\/\/arxiv.org\/abs\/1804.00261."},{"key":"e_1_2_1_33_1","unstructured":"S. L. Moshier. 2000. Cephes Math Library. Retrieved from http:\/\/www.moshier.net.  S. L. Moshier. 2000. Cephes Math Library. Retrieved from http:\/\/www.moshier.net."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ARITH.2011.39"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2646-6"},{"key":"e_1_2_1_36_1","unstructured":"Jean-Michel Muller. 2005. On the Definition of ulp(x). Retrieved from http:\/\/www.ens-lyon.fr\/LIP\/Pub\/Rapports\/RR\/RR2005\/RR2005-09.pdf.  Jean-Michel Muller. 2005. On the Definition of ulp(x). Retrieved from http:\/\/www.ens-lyon.fr\/LIP\/Pub\/Rapports\/RR\/RR2005\/RR2005-09.pdf."},{"volume-title":"Elementary Functions","author":"Muller Jean-Michel","key":"e_1_2_1_37_1","unstructured":"Jean-Michel Muller . 2006. Elementary Functions . Springer . Jean-Michel Muller. 2006. Elementary Functions. Springer."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1975-0388757-3"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(94)90076-0"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2870650.2870653"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpa.3160250405"},{"volume-title":"Investigation of Different Methods of Fast Polynomial Evaluation. Master\u2019s thesis","author":"Reynolds Gavin S.","key":"e_1_2_1_42_1","unstructured":"Gavin S. Reynolds . 2010. Investigation of Different Methods of Fast Polynomial Evaluation. Master\u2019s thesis . The University of Edinburgh . Gavin S. Reynolds. 2010. Investigation of Different Methods of Fast Polynomial Evaluation. Master\u2019s thesis. The University of Edinburgh."},{"key":"e_1_2_1_43_1","unstructured":"Hugues De Lassus Saint-Genies. 2018. Elementary Functions: Towards Automatically Generated Efficient and Vectorizable Implementations. Ph.D. Dissertation. Universit\u00e9 de Perpignan.  Hugues De Lassus Saint-Genies. 2018. Elementary Functions: Towards Automatically Generated Efficient and Vectorizable Implementations. Ph.D. Dissertation. Universit\u00e9 de Perpignan."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00450-010-0108-2"},{"key":"e_1_2_1_45_1","unstructured":"Lol Software. 2012. Remez exchange toolbox. Retrieved from http:\/\/lolengine.net\/wiki\/doc\/maths\/remez.  Lol Software. 2012. Remez exchange toolbox. Retrieved from http:\/\/lolengine.net\/wiki\/doc\/maths\/remez."},{"key":"e_1_2_1_46_1","first-page":"2","article-title":"Table-driven implementation of the exponential function in IEEE floating-point arithmetic","volume":"15","author":"Peter Tang Ping-Tak","year":"1989","unstructured":"Ping-Tak Peter Tang . 1989 . Table-driven implementation of the exponential function in IEEE floating-point arithmetic . ACM Trans. Math. Softw. 15 , 2 (June 1989), 144--157. Ping-Tak Peter Tang. 1989. Table-driven implementation of the exponential function in IEEE floating-point arithmetic. ACM Trans. Math. Softw. 15, 2 (June 1989), 144--157.","journal-title":"ACM Trans. Math. Softw."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ARITH.1991.145565"},{"key":"e_1_2_1_48_1","volume-title":"Josuttis","author":"Vandevoorde David","year":"2002","unstructured":"David Vandevoorde and Nicolai M . Josuttis . 2002 . C++ Templates : The Complete Guide (1st ed.). Addison-Wesley Professional . David Vandevoorde and Nicolai M. Josuttis. 2002. C++ Templates: The Complete Guide (1st ed.). Addison-Wesley Professional."}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3408893","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3408893","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:58Z","timestamp":1750193278000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3408893"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,15]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9,30]]}},"alternative-id":["10.1145\/3408893"],"URL":"https:\/\/doi.org\/10.1145\/3408893","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"type":"print","value":"0098-3500"},{"type":"electronic","value":"1557-7295"}],"subject":[],"published":{"date-parts":[[2020,9,15]]},"assertion":[{"value":"2018-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}