{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T14:14:31Z","timestamp":1754144071775,"version":"3.41.2"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,10,2]],"date-time":"2024-10-02T00:00:00Z","timestamp":1727827200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,2]],"date-time":"2024-10-02T00:00:00Z","timestamp":1727827200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Numer Algor"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The square root is one of the most used functions in many different engineering and scientific applications. We propose new methods for calculating the square root function that are based on the Newton\u2013Raphson method with Heron iteration. A modification of Heron\u2019s formula combined with an improved selection of the magic constants enables a significant reduction of the maximum relative error (MRE). Simple modifications to the Newton\u2013Raphson formula and the magic number method enable implementation on platforms with limited hardware resources, such as microcontrollers and FPGAs, with variable accuracy. Implementations of new approximation algorithms in the C programming language were carefully tested and evaluated against their software and hardware counterparts on the most popular platforms, e.g., CPUs from Intel, AMD and ARM, GPU from Nvidia and IPU from Graphcore. The proposed numerical algorithms are shown to be superior in terms of computational time, the number of clock cycles, accuracy, MRE, and root mean square deviation.<\/jats:p>","DOI":"10.1007\/s11075-024-01932-7","type":"journal-article","created":{"date-parts":[[2024,10,2]],"date-time":"2024-10-02T05:02:01Z","timestamp":1727845321000},"page":"1805-1828","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Fast and accurate approximation algorithms for computing floating point square root"],"prefix":"10.1007","volume":"99","author":[{"given":"Zbigniew","family":"Kokosi\u0144ski","sequence":"first","affiliation":[]},{"given":"Pawe\u0142","family":"Gepner","sequence":"additional","affiliation":[]},{"given":"Leonid","family":"Moroz","sequence":"additional","affiliation":[]},{"given":"Volodymyr","family":"Samotyy","sequence":"additional","affiliation":[]},{"given":"Mariusz","family":"W\u0119grzyn","sequence":"additional","affiliation":[]},{"given":"Nataliia","family":"Gavkalova","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,10,2]]},"reference":[{"issue":"1","key":"1932_CR1","doi-asserted-by":"publisher","first-page":"20540","DOI":"10.1038\/s41598-022-25039-y","volume":"12","author":"A Altamimi","year":"2022","unstructured":"Altamimi, A., Youssef, B.B.: Novel seed generation and quadrature-based square rooting algorithms. Sci. Rep. 12(1), 20540 (2022). https:\/\/doi.org\/10.1038\/s41598-022-25039-y","journal-title":"Sci. Rep."},{"key":"1932_CR2","doi-asserted-by":"publisher","unstructured":"Andrews, M.: Mathematical microprocessor software: a $$\\sqrt{(}x)$$ comparison. IEEE Micro 2(2), 63\u201379 (1982). https:\/\/doi.org\/10.1109\/MM.1982.290970","DOI":"10.1109\/MM.1982.290970"},{"key":"1932_CR3","doi-asserted-by":"publisher","unstructured":"Anghel, C., Paleologu, C., Benesty, J., Ciochin\u0103, S.: FPGA implementation of a variable step-size affine projection algorithm for acoustic echo cancellation. In: 18th European Signal Processing Conference (EUSIPCO-2010), Aalborg, Denmark, 23-27 August 2010, pp. 532\u2013536 (2010). https:\/\/doi.org\/10.5281\/zenodo.41864","DOI":"10.5281\/zenodo.41864"},{"key":"1932_CR4","unstructured":"Anghel, C., Ciochina, S.: On the FPGA implementation of the VR-RLS algorithms. In: The Sixteenth International Conference on Networks (ICN 2017), Venice, Italy, 23\u201327 April, 2017, pp. 98\u2013101. Available online: https:\/\/api.semanticscholar.org\/CorpusID:250447098 (2017)"},{"key":"1932_CR5","doi-asserted-by":"publisher","unstructured":"Beebe, N.H.F.: The mathematical-function computation handbook: programming using the mathCW portable software library. Springer-Verlag: Berlin, pp. 215-242 (Roots) (2017). https:\/\/doi.org\/10.1007\/978-3-319-64110-2","DOI":"10.1007\/978-3-319-64110-2"},{"issue":"4","key":"1932_CR6","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1109\/38.595279","volume":"17","author":"JF Blinn","year":"1997","unstructured":"Blinn, J.F.: Floating-point tricks. IEEE Comput. Graph. Appl. 17(4), 80\u201384 (1997). https:\/\/doi.org\/10.1109\/38.595279","journal-title":"IEEE Comput. Graph. Appl."},{"key":"1932_CR7","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1017\/S0962492922000101","volume":"32","author":"S Boldo","year":"2023","unstructured":"Boldo, S., Jeannerod, C.-P., Melquiond, G., Muller, J.M.: Floating-point arithmetic. Acta Numer. 32, 203\u2013290 (2023). https:\/\/doi.org\/10.1017\/S0962492922000101","journal-title":"Acta Numer."},{"key":"1932_CR8","doi-asserted-by":"publisher","unstructured":"Bruguera, J.D.: Low latency floating-point division and square root unit. IEEE Trans. Comput. 69(2), 274\u2013287 (2020). https:\/\/doi.org\/10.1109\/TC.2019.2947899","DOI":"10.1109\/TC.2019.2947899"},{"key":"1932_CR9","doi-asserted-by":"publisher","unstructured":"Chen, J., Xue, L., Anderson J.H.: Software-specified FPGA accelerators for elementary functions. In: 2018 International conference on Field-Programmable Technology (FPT), Naha, Japan, 10\u201314 December, pp. 54-61 (2018). https:\/\/doi.org\/10.1109\/FPT.2018.00019","DOI":"10.1109\/FPT.2018.00019"},{"key":"1932_CR10","unstructured":"Chen, J.: Hardware acceleration for elementary functions and RISC-V processor. Ph.D. Thesis, McGill University, Montreal, QC, Canada (2020)"},{"issue":"3","key":"1932_CR11","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1080\/00207160008804985","volume":"75","author":"RA Chowdhury","year":"2006","unstructured":"Chowdhury, R.A., Kaykobad, M.: Calculating the square root with arbitrary order of convergence. Int. J. Comput. Math. 75(3), 297\u2013302 (2006). https:\/\/doi.org\/10.1080\/00207160008804985","journal-title":"Int. J. Comput. Math."},{"key":"1932_CR12","unstructured":"CMSIS DSP Software Library: CMSIS-DSP Version 1.10.0. Available online: https:\/\/www.keil.com\/pack\/doc\/CMSIS\/DSP\/html\/index.html"},{"key":"1932_CR13","unstructured":"Crawford, J.A.: Computing square roots. AM1 LLC, U11891. Available online: http:\/\/www.am1.us\/ (2005)"},{"key":"1932_CR14","unstructured":"Detmer, R.C.: Introduction to 80x86 assembly language and computer architecture, 3rd edn., pp. 99\u2013122. Jones, Bartlett Learning (2006)"},{"key":"1932_CR15","doi-asserted-by":"publisher","unstructured":"Dutta, S., Tavva, Y., Bhattacharjee, D., Chattopadhyay, A.: Efficient quantum circuits for square-root and inverse square-root. In: 2020 33rd International conference on VLSI design and 2020 19th International conference on embedded systems (VLSID), Bangalore, India, pp. 55\u201360 (2020). https:\/\/doi.org\/10.1109\/VLSID49098.2020.00027","DOI":"10.1109\/VLSID49098.2020.00027"},{"key":"1932_CR16","unstructured":"van Eekelen, M., Frumin, D., Geuvers, H., Gondelman, L., Krebbers, R., Schoolderman, M., Smetsers, S., Verbeek, F., Viguier, B., Wiedijk, F.: A benchmark for C program verification. arXiv:1904.01009 (2019)"},{"key":"1932_CR17","doi-asserted-by":"publisher","unstructured":"Ercegovac, M.D., Lang T.: Digital Arithmetics. Chap. 7. \u2018Reciprocal, Division, Reciprocal Square Root, and Square Root by Iterative Approximation\u2019, pp. 366\u2013395. Chap. 11. \u2018Cordic algorithm and implementations\u2019, pp. 608\u2013648. Morgan Kaufmann (2004). https:\/\/doi.org\/10.1016\/B978-155860798-9\/50009-9","DOI":"10.1016\/B978-155860798-9\/50009-9"},{"key":"1932_CR18","unstructured":"Fog, A.: Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD, and VIA CPUs. Available online: https:\/\/www.agner.org\/optimize\/instruction_tables.pdf (2019)"},{"key":"1932_CR19","doi-asserted-by":"publisher","unstructured":"Gepner, P., Gamayunov, V., Fraser L.: Effective implementation of DGEMM on modern multicore CPU. In: International Conference on Computational Science, ICCS 2012 (2012). https:\/\/doi.org\/10.1016\/j.procs.2012.04.014","DOI":"10.1016\/j.procs.2012.04.014"},{"key":"1932_CR20","doi-asserted-by":"publisher","unstructured":"Gepner, P., Fraser, D., Kowalik M.: Second generation quad-core Intel Xeon processors bring 45 nm technology and a new level of performance to HPC applications. In: International Conference on Computational Science, ICCS 2008 (2008). https:\/\/doi.org\/10.1007\/978-3-540-69384-0_47","DOI":"10.1007\/978-3-540-69384-0_47"},{"key":"1932_CR21","unstructured":"Granlund, T.: Instruction latencies and throughput for AMD and Intel x86 processors. Available online: https:\/\/gmplib.org\/~tege\/x86-timing.pdf (2019)"},{"key":"1932_CR22","doi-asserted-by":"crossref","unstructured":"Gustafsson, O., Wanhammar, L.: Square root computation. Polynomial and piecewise polynomial approximations. In: Arithmetic circuits for DSP applications. Meher, P.K., Stouraitis, T. (eds.) IEEE Press, Wiley, pp. 27\u201329 (2017)","DOI":"10.1002\/9781119206804.ch1"},{"key":"1932_CR23","doi-asserted-by":"publisher","unstructured":"Gustafsson, O., Hellman, N.: Approximate floating-point operations with integer units by processing in the logarithmic domain. In: 2021 IEEE 28th Symposium on computer arithmetic (ARITH), Lyngby, Denmark, 14\u201316 June, 2021, pp. 45\u201352 (2021).https:\/\/doi.org\/10.1109\/ARITH51176.2021.00019","DOI":"10.1109\/ARITH51176.2021.00019"},{"key":"1932_CR24","doi-asserted-by":"publisher","unstructured":"Hasnat, A., Bhattacharyya, T., Dey, A., Halder, S., Bhattacharjee D.: A fast FPGA based architecture for computation of square root and inverse square root. In: 2017 Devices for Integrated Circuit (DevIC), Kalyani, India, 23\u201324 March, pp. 383-387 (2017). https:\/\/doi.org\/10.1109\/DEVIC.2017.8073975","DOI":"10.1109\/DEVIC.2017.8073975"},{"issue":"4","key":"1932_CR25","doi-asserted-by":"publisher","first-page":"1197","DOI":"10.1109\/TC.2015.2441714","volume":"65","author":"M Joldes","year":"2015","unstructured":"Joldes, M., Marty, O., Muller, J., Popescu, V.: Arithmetic algorithms for extended precision using floating-point expansions. IEEE Trans. Comput. 65(4), 1197\u20131210 (2015). https:\/\/doi.org\/10.1109\/TC.2015.2441714","journal-title":"IEEE Trans. Comput."},{"issue":"4","key":"1932_CR26","doi-asserted-by":"publisher","first-page":"561","DOI":"10.1145\/279232.279237","volume":"23","author":"AH Karp","year":"1997","unstructured":"Karp, A.H., Markstein, P.: High-precision division and square root. ACM Trans. Math. Softw. 23(4), 561\u2013589 (1997). https:\/\/doi.org\/10.1145\/279232.279237","journal-title":"ACM Trans. Math. Softw."},{"issue":"1","key":"1932_CR27","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1016\/j.tcs.2005.09.056","volume":"351","author":"P Kornerup","year":"2006","unstructured":"Kornerup, P., Muller, J.-M.: Choosing starting values for certain Newton-Raphson iterations. Theor. Comput. Sci. 351(1), 101\u2013110 (2006). https:\/\/doi.org\/10.1016\/j.tcs.2005.09.056","journal-title":"Theor. Comput. Sci."},{"key":"1932_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.sysarc.2017.06.005","volume":"79","author":"F Lemaitre","year":"2017","unstructured":"Lemaitre, F., Couturier, B., Lacassagne, L.: Cholesky factorization on SIMD multi-core architectures. J. Syst. Architect. 79, 1\u201315 (2017). https:\/\/doi.org\/10.1016\/j.sysarc.2017.06.005","journal-title":"J. Syst. Architect."},{"key":"1932_CR29","unstructured":"Lomont, C.: Fast inverse square root. Purdue University, Tech. Rep. Available online: http:\/\/www.lomont.org\/Math\/Papers\/2003\/InvSqrt.pdf (2003)"},{"issue":"1","key":"1932_CR30","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1049\/IP-E.1990.0003","volume":"137","author":"P Montuschi","year":"1990","unstructured":"Montuschi, P., Mezzalama, M.: Survey of square rooting algorithms. IEE Proc. Comput. Digit. Tech. 137(1), 31\u201340 (1990). https:\/\/doi.org\/10.1049\/IP-E.1990.0003","journal-title":"IEE Proc. Comput. Digit. Tech."},{"key":"1932_CR31","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1007\/BF02239012","volume":"46","author":"P Montuschi","year":"1991","unstructured":"Montuschi, P., Mezzalama, M.: Optimal absolute error starting values for Newton-Raphson calculation of square root. Computing 46, 67\u201386 (1991). https:\/\/doi.org\/10.1007\/BF02239012","journal-title":"Computing"},{"key":"1932_CR32","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1016\/j.amc.2017.08.025","volume":"316","author":"L Moroz","year":"2018","unstructured":"Moroz, L., Walczyk, C.J., Hrynchyshyn, A., Holimath, V., Cie\u015bli\u0144ski, J.L.: Fast calculation of inverse square root with the use of magic constant - Analytical approach. Appl. Math. Comput. 316, 245\u2013255 (2018). https:\/\/doi.org\/10.1016\/j.amc.2017.08.025","journal-title":"Appl. Math. Comput."},{"key":"1932_CR33","doi-asserted-by":"publisher","unstructured":"Moroz, L., Samotyy, V., Horyachyy O., Dzelendzyak, U.: Algorithms for calculating the square root and inverse square root based on the second-order Householder\u2019s method. In: Proceedings of the 2019 10th IEEE International conference on Intelligent Data Acquisition and Advanced Computing Systems: technology and applications (IDAACS), Metz, France, 18\u201321 September, 2019, pp. 436-442 (2019). https:\/\/doi.org\/10.1109\/IDAACS.2019.8924302","DOI":"10.1109\/IDAACS.2019.8924302"},{"issue":"2","key":"1932_CR34","doi-asserted-by":"publisher","first-page":"21","DOI":"10.3390\/computation9020021","volume":"9","author":"L Moroz","year":"2021","unstructured":"Moroz, L., Samotyy, V., Horyachyy, O.: Modified fast inverse square root and square root approximation algorithms: The method of switching magic constants. Computation 9(2), 21 (2021). https:\/\/doi.org\/10.3390\/computation9020021","journal-title":"Computation"},{"key":"1932_CR35","doi-asserted-by":"publisher","unstructured":"Moroz, L., Samotyy, V., W\u0229grzyn, M., Dzelendzyak, U.: Efficient floating-point square root and reciprocal square root algorithms. In: 2021 11th IEEE International conference on Intelligent Data Acquisition and Advanced Computing Systems: technology and applications (IDAACS), Cracow, Poland, 22\u201325 September, 2022, Vol. 1, pp. 552\u2013559 (2022). https:\/\/doi.org\/10.1109\/IDAACS53288.2021.9660872","DOI":"10.1109\/IDAACS53288.2021.9660872"},{"key":"1932_CR36","doi-asserted-by":"publisher","unstructured":"Mostefa, M.B., Boussaid, A., Khezzar, A.: FPGA-based algorithm for harmonic current mitigation. In: 2022 2nd International Conference on Advanced Electrical Engineering (ICAEE), Constantine, Algeria, 29\u201331 October, 2022, pp. 1\u20135 (2022). https:\/\/doi.org\/10.1109\/ICAEE53772.2022.9962021","DOI":"10.1109\/ICAEE53772.2022.9962021"},{"issue":"7","key":"1932_CR37","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1145\/363427.363454","volume":"10","author":"DG Moursund","year":"1967","unstructured":"Moursund, D.G.: Optimal starting values for Newton-Raphson calculation of $$sqrt(x)$$. Comm. ACM 10(7), 430\u2013432 (1967). https:\/\/doi.org\/10.1145\/363427.363454","journal-title":"Comm. ACM"},{"key":"1932_CR38","doi-asserted-by":"publisher","unstructured":"Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lef\u00e8vre, V., Melquiond, G., Revol, N., Stehl\u00e9, D., Torres, S.: Handbook of Floating-Point Arithmetic, 2nd edn. Basel, Switzerland, Birkh\u00e4user (2018). https:\/\/doi.org\/10.1007\/978-0-8176-4705-6","DOI":"10.1007\/978-0-8176-4705-6"},{"key":"1932_CR39","doi-asserted-by":"publisher","unstructured":"Muller, J.-M.: Elementary functions and approximate computing. Proc. IEEE 108(12), 2136\u20132149 (2020). https:\/\/doi.org\/10.1109\/jproc.2020.2991885","DOI":"10.1109\/jproc.2020.2991885"},{"issue":"5","key":"1932_CR40","doi-asserted-by":"publisher","first-page":"052027","DOI":"10.1088\/1742-6596\/513\/5\/052027","volume":"513","author":"D Piparo","year":"2014","unstructured":"Piparo, D., Innocente, V., Hauth, T.: Speeding up HEP experiment software with a library of fast and auto-vectorisable mathematical functions. J. Physics: Conf. Ser. 513(5), 052027 (2014). https:\/\/doi.org\/10.1088\/1742-6596\/513\/5\/052027","journal-title":"J. Physics: Conf. Ser."},{"key":"1932_CR41","doi-asserted-by":"publisher","unstructured":"Walczyk, C., Moroz, L., Cie\u015bli\u0144ski, J.: Improving the accuracy of the fast inverse square root by modifying Newton-Raphson corrections. Entropy 3(1), 86 (2021). https:\/\/doi.org\/10.3390\/e23010086","DOI":"10.3390\/e23010086"},{"key":"1932_CR42","doi-asserted-by":"publisher","unstructured":"Wang, S., Deng, X., Liu, W., Li, Y., Chen, S. Chen, Liu, L.: FPGA-based acceleration of structured light depth estimation. In: 2022 China Automation Congress (CAC), Xiamen, China, 25\u201327 November, 2022, pp. 4191\u20134196 (2022). https:\/\/doi.org\/10.1109\/CAC57257.2022.10055770","DOI":"10.1109\/CAC57257.2022.10055770"},{"key":"1932_CR43","doi-asserted-by":"publisher","unstructured":"Wei, J., Kuwana, A., Kobayashi, H., Kubo, K., Tanaka, Y.: Floating-point inverse square root algorithm based on Taylor-series expansion. IEEE Trans. Circ. Syst. II: Express Briefs 68(7), 2640\u20132644 (2021). https:\/\/doi.org\/10.1109\/TCSII.2021.3062358","DOI":"10.1109\/TCSII.2021.3062358"},{"key":"1932_CR44","doi-asserted-by":"publisher","unstructured":"Yasin, A., Pillement, T., Ciesielski, S.: Functional verification of hardware dividers using algebraic model. In: 2019 IFIP\/IEEE 27th International conference on Very Large Scale Integration (VLSI-SoC), Cuzco, Peru, 6\u20139 October, 2019, pp. 257\u2013262 (2019). https:\/\/doi.org\/10.1109\/VLSISoC.2019.8920335","DOI":"10.1109\/VLSISoC.2019.8920335"}],"container-title":["Numerical Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11075-024-01932-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11075-024-01932-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11075-024-01932-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T07:11:40Z","timestamp":1752563500000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11075-024-01932-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,2]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["1932"],"URL":"https:\/\/doi.org\/10.1007\/s11075-024-01932-7","relation":{},"ISSN":["1017-1398","1572-9265"],"issn-type":[{"type":"print","value":"1017-1398"},{"type":"electronic","value":"1572-9265"}],"subject":[],"published":{"date-parts":[[2024,10,2]]},"assertion":[{"value":"3 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 October 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"There is no aproval committee for our research and the submited article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}