{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:26:50Z","timestamp":1773246410751,"version":"3.50.1"},"reference-count":40,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2021,2,17]],"date-time":"2021-02-17T00:00:00Z","timestamp":1613520000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Many low-cost platforms that support floating-point arithmetic, such as microcontrollers and field-programmable gate arrays, do not include fast hardware or software methods for calculating the square root and\/or reciprocal square root. Typically, such functions are implemented using direct lookup tables or polynomial approximations, with a subsequent application of the Newton\u2013Raphson method. Other, more complex solutions include high-radix digit-recurrence and bipartite or multipartite table-based methods. In contrast, this article proposes a simple modification of the fast inverse square root method that has high accuracy and relatively low latency. Algorithms are given in C\/C++ for single- and double-precision numbers in the IEEE 754 format for both square root and reciprocal square root functions. These are based on the switching of magic constants in the initial approximation, depending on the input interval of the normalized floating-point numbers, in order to minimize the maximum relative error on each subinterval after the first iteration\u2014giving 13 correct bits of the result. Our experimental results show that the proposed algorithms provide a fairly good trade-off between accuracy and latency after two iterations for numbers of type float, and after three iterations for numbers of type double when using fused multiply\u2013add instructions\u2014giving almost complete accuracy.<\/jats:p>","DOI":"10.3390\/computation9020021","type":"journal-article","created":{"date-parts":[[2021,2,17]],"date-time":"2021-02-17T04:49:01Z","timestamp":1613537341000},"page":"21","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants"],"prefix":"10.3390","volume":"9","author":[{"given":"Leonid V.","family":"Moroz","sequence":"first","affiliation":[{"name":"Information Technologies Security Department, Lviv Polytechnic National University, 79013 Lviv, Ukraine"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2344-2576","authenticated-orcid":false,"given":"Volodymyr V.","family":"Samotyy","sequence":"additional","affiliation":[{"name":"Automation and Information Technologies Department, Cracow University of Technology, 31155 Cracow, Poland"},{"name":"Information Security Management Department, Lviv State University of Life Safety, 79007 Lviv, Ukraine"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4948-458X","authenticated-orcid":false,"given":"Oleh Y.","family":"Horyachyy","sequence":"additional","affiliation":[{"name":"Information Technologies Security Department, Lviv Polytechnic National University, 79013 Lviv, Ukraine"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1109\/MSP.2005.1406500","article-title":"A root of less evil digital signal processing","volume":"22","author":"Allie","year":"2005","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_2","unstructured":"Parhami, B. (2010). Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hasnat, A., Bhattacharyya, T., Dey, A., Halder, S., and Bhattacharjee, D. (2017). A fast FPGA based architecture for computation of square root and Inverse Square Root. 2017 Devices Integr. Circuit (DevIC), 383\u2013387.","DOI":"10.1109\/DEVIC.2017.8073975"},{"key":"ref_4","unstructured":"Beebe, N.H.F. (2017). The Mathematical-Function Computation Handbook: Programming Using the MathCW Portable Software Library, Springer International Publishing. [1st ed.]."},{"key":"ref_5","unstructured":"Loosemore, S., Stallman, R., McGrath, R., Oram, A., and Drepper, U. (2020). The GNU C Library Reference Manual for Version 2.31, Free Software Foundation Inc.. Available online: https:\/\/www.gnu.org\/software\/libc\/manual\/pdf\/libc.pdf."},{"key":"ref_6","unstructured":"(2020, May 27). Raspberry Pi 3 Model B. RS Components: Corby, UK. Available online: https:\/\/www.alliedelec.com\/m\/d\/4252b1ecd92888dbb9d8a39b536e7bf2.pdf."},{"key":"ref_7","unstructured":"(2020, December 19). Floating Point Unit Demonstration on STM32 Microcontrollers; Application Note AN4044, DocID022737 Rev 2; STMicroelectronics N.V., May 2016. Available online: https:\/\/www.st.com\/resource\/en\/application_note\/dm00047230-floating-point-unit-demonstration-on-stm32-microcontrollers-stmicroelectronics.pdf."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.sysarc.2017.06.005","article-title":"Cholesky factorization on SIMD multi-core architectures","volume":"79","author":"Lemaitre","year":"2017","journal-title":"J. Syst. Arch."},{"key":"ref_9","unstructured":"Fog, A. (2020). Instruction Tables: Lists of Instruction Latencies, Throughputs and Micro-Operation Breakdowns for Intel, AMD and VIA CPUs, Technical University of Denmark. Available online: https:\/\/www.agner.org\/optimize\/instruction_tables.pdf."},{"key":"ref_10","unstructured":"(2019). Intel 64 and IA-32 Architectures Software Developer\u2019s Manual, Intel Corp.. Available online: https:\/\/software.intel.com\/sites\/default\/files\/managed\/39\/c5\/325462-sdm-vol-1-2abcd-3abcd.pdf."},{"key":"ref_11","unstructured":"(2016). ARM NEON Intrinsics Reference, ARM Ltd.. IHI 0073B."},{"key":"ref_12","unstructured":"(2010). Xtensa Instruction Set Architecture (ISA), Tensilica Inc.. Available online: https:\/\/usermanual.wiki\/Document\/Xtensa2020ASSEMBLER20GUIDE.1231659642\/view."},{"key":"ref_13","unstructured":"(2019). Intel Cyclone 10 GX Device Overview, Intel Corp.. Available online: https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/cyclone-10\/c10gx-51001.pdf."},{"key":"ref_14","unstructured":"Yi, J.J., Joshi, A., Sendag, R., Eeckhout, L., and Lilja, D.J. (2006, January 23). Analyzing the Processor Bottlenecks in SPEC CPU 2000. Proceedings of the 2006 SPEC Benchmark Workshop, Austin, TX, USA."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2136","DOI":"10.1109\/JPROC.2020.2991885","article-title":"Elementary Functions and Approximate Computing","volume":"108","author":"Muller","year":"2020","journal-title":"Proc. IEEE"},{"key":"ref_16","unstructured":"Muller, J.-M. (2006). Elementary Functions: Algorithms and Implementation, Birkh\u00e4user. [2nd ed.]."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Muller, J.-M., Brunie, N., de Dinechin, F., Jeannerod, C.-P., Joldes, M., Lef\u00e8vre, V., Melquiond, G., Revol, N., and Torres, S. (2018). Handbook of Floating-Point Arithmetic, Birkh\u00e4user. [2nd ed.].","DOI":"10.1007\/978-3-319-76526-6"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1109\/TC.2019.2947899","article-title":"Low Latency Floating-Point Division and Square Root Unit","volume":"69","author":"Bruguera","year":"2020","journal-title":"IEEE Trans. Comput."},{"key":"ref_19","unstructured":"Cornea-Hasegan, M.A., Golliver, R.A., and Markstein, P. (1999, January 14\u201316). Correctness proofs outline for Newton-Raphson based floating-point divide and square root algorithms. Proceedings of the 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336), Adelaide, Australia."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Eberly, D.H. (2015). GPGPU Programming for Games and Science, CRC Press.","DOI":"10.1201\/b17296"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1023\/A:1009984523264","article-title":"A Few Results on Table-Based Methods","volume":"5","author":"Muller","year":"1999","journal-title":"Dev. Reliab. Comput."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"842","DOI":"10.1109\/12.795125","article-title":"Approximating elementary functions with symmetric bipartite tables","volume":"48","author":"Schulte","year":"1999","journal-title":"IEEE Trans. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1109\/TC.2005.54","article-title":"Multipartite table methods","volume":"54","author":"Tisserand","year":"2005","journal-title":"IEEE Trans. Comput."},{"key":"ref_24","first-page":"80","article-title":"Floating-point tricks","volume":"17","author":"Blinn","year":"1997","journal-title":"IEEE Eng. Med. Boil. Mag."},{"key":"ref_25","unstructured":"Lomont, C. (2003). Fast Inverse Square Root, Purdue University. Available online: http:\/\/www.lomont.org\/papers\/2003\/InvSqrt.pdf."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"461","DOI":"10.47839\/ijc.18.4.1616","article-title":"Simple effective fast inverse square root algorithm with two magic constants","volume":"18","author":"Horyachyy","year":"2019","journal-title":"Int. J. Comput."},{"key":"ref_27","unstructured":"(1999). Quake III Arena, Id Software Inc.. Available online: https:\/\/github.com\/id-Software\/Quake-III-Arena\/blob\/master\/code\/game\/q_math.c#L552."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.amc.2017.08.025","article-title":"Fast calculation of inverse square root with the use of magic constant\u2013analytical approach","volume":"316","author":"Moroz","year":"2018","journal-title":"Appl. Math. Comput."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Walczyk, C.J., Moroz, L.V., and Cie\u015bli\u0144ski, J.L. (2021). Improving the Accuracy of the Fast Inverse Square Root by Modifying Newton\u2013Raphson Corrections. Entropy, 23.","DOI":"10.3390\/e23010086"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lin, J., Xu, Z., Nukada, A., Maruyama, N., and Matsuoka, S. (2017, January 14\u201317). Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor. Proceedings of the 2017 46th International Conference on Parallel Processing (ICPP), Bristol, UK.","DOI":"10.1109\/ICPP.2017.52"},{"key":"ref_31","unstructured":"Carlile, B., Delamarter, G., Kinney, P., Marti, A., and Whitney, B. (2017). Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). arXiv, Available online: https:\/\/arxiv.org\/pdf\/1710.09967.pdf."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Moroz, L., Samotyy, V., Horyachyy, O., and Dzelendzyak, U. (2019, January 18\u201321). Algorithms for Calculating the Square Root and Inverse Square Root Based on the Second-Order Householder\u2019s Method. Proceedings of the 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Metz, France.","DOI":"10.1109\/IDAACS.2019.8924302"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zafar, S., and Adapa, R. (2014, January 9\u201311). Hardware architecture design and mapping of Fast Inverse Square Root algorithm. Proceedings of the 2014 International Conference on Advances in Electrical Engineering (ICAEE), Vellore, India.","DOI":"10.1109\/ICAEE.2014.6838433"},{"key":"ref_34","first-page":"645","article-title":"Novel detector implementations for 3G LTE downlink and uplink","volume":"78","author":"Janhunen","year":"2013","journal-title":"Analog. Integr. Circuits Signal Process."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hsu, C.-J., Chen, J.-L., and Chen, L.-G. (2015, January 24\u201326). An efficient hardware implementation of HON4D feature extraction for real-time action recognition. Proceedings of the 2015 International Symposium on Consumer Electronics (ISCE), Madrid, Spain.","DOI":"10.1109\/ISCE.2015.7177775"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1109\/TBCAS.2014.2376956","article-title":"A UWB Radar Signal Processing Platform for Real-Time Human Respiratory Feature Extraction Based on Four-Segment Linear Waveform Model","volume":"10","author":"Hsieh","year":"2015","journal-title":"IEEE Trans. Biomed. Circuits Syst."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Sangeetha, D., and Deepa, P. (2017, January 7\u201311). Efficient Scale Invariant Human Detection Using Histogram of Oriented Gradients for IoT Services. Proceedings of the 2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID), Hyderabad, India.","DOI":"10.1109\/VLSID.2017.60"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Moroz, L., Samotyy, V., and Horyachyy, O. (2018, January 20\u201321). An Effective Floating-Point Reciprocal. Proceedings of the 2018 IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS), Lviv, Ukraine.","DOI":"10.1109\/IDAACS-SWS.2018.8525803"},{"key":"ref_39","unstructured":"(2018). ESP32-WROOM-32 (ESP-WROOM-32) Datasheet, Espressif Systems. Available online: https:\/\/www.mouser.com\/datasheet\/2\/891\/esp-wroom-32_datasheet_en-1223836.pdf."},{"key":"ref_40","unstructured":"(2020). Intel Stratix 10 GX\/SX Device Overview, Intel Corp.. Available online: https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/stratix-10\/s10-overview.pdf."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/9\/2\/21\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:25:02Z","timestamp":1760160302000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/9\/2\/21"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,17]]},"references-count":40,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["computation9020021"],"URL":"https:\/\/doi.org\/10.3390\/computation9020021","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,17]]}}}