{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T05:23:34Z","timestamp":1775453014288,"version":"3.50.1"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,2,13]],"date-time":"2021-02-13T00:00:00Z","timestamp":1613174400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,2,13]],"date-time":"2021-02-13T00:00:00Z","timestamp":1613174400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cryptogr Eng"],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Applications such as public-key cryptography are critically reliant on the speed of modular multiplication for their performance. This paper introduces a new block-based variant of Montgomery multiplication, the Block Product Scanning (BPS) method, which is particularly efficient using new 512-bit advanced vector instructions (AVX-512) on modern Intel processor families. Our parallel-multiplication approach also allows for squaring and sub-quadratic Karatsuba enhancements. We demonstrate <jats:inline-formula><jats:alternatives><jats:tex-math>$$1.9\\,\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>1.9<\/mml:mn>\n                    <mml:mspace\/>\n                    <mml:mo>\u00d7<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> improvement in decryption throughput in comparison with OpenSSL and <jats:inline-formula><jats:alternatives><jats:tex-math>$$1.5\\,\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>1.5<\/mml:mn>\n                    <mml:mspace\/>\n                    <mml:mo>\u00d7<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> improvement in modular exponentiation throughput compared to GMP-6.1.2 on an Intel Xeon CPU. In addition, we show <jats:inline-formula><jats:alternatives><jats:tex-math>$$1.4\\,\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>1.4<\/mml:mn>\n                    <mml:mspace\/>\n                    <mml:mo>\u00d7<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> improvement in decryption throughput in comparison with state-of-the-art vector implementations on many-core Knights Landing Xeon Phi hardware. Finally, we show how interleaving Chinese remainder theorem-based RSA calculations within our parallel BPS technique halves decryption latency while providing protection against fault-injection attacks.<\/jats:p>","DOI":"10.1007\/s13389-021-00256-9","type":"journal-article","created":{"date-parts":[[2021,2,14]],"date-time":"2021-02-14T13:47:08Z","timestamp":1613310428000},"page":"95-105","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Parallel modular multiplication using 512-bit advanced vector instructions"],"prefix":"10.1007","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3019-4523","authenticated-orcid":false,"given":"Benjamin","family":"Buhrow","sequence":"first","affiliation":[]},{"given":"Barry","family":"Gilbert","sequence":"additional","affiliation":[]},{"given":"Clifton","family":"Haider","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,2,13]]},"reference":[{"key":"256_CR1","doi-asserted-by":"publisher","unstructured":"Boneh, D., DeMillo, R.A., Lipton, R.J.: On the importance of checking cryptographic protocols for faults (extended abstract). In: Advances in Cryptology\u2014EUROCRYPT \u201997, International Conference on the Theory and Application of Cryptographic Techniques, May 11\u201315, 1997, Lecture Notes in Computer Science, vol. 1233, pp. 37\u201351. Springer (1997). https:\/\/doi.org\/10.1007\/3-540-69053-0_4","DOI":"10.1007\/3-540-69053-0_4"},{"key":"256_CR2","doi-asserted-by":"publisher","unstructured":"Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Selected Areas in Cryptography\u2014SAC, August 14\u201316, 2013, pp. 471\u2013489 (2013). https:\/\/doi.org\/10.1007\/978-3-662-43414-7_24","DOI":"10.1007\/978-3-662-43414-7_24"},{"key":"256_CR3","doi-asserted-by":"publisher","unstructured":"Chang, C., Yao, S., Yu, D.: Vectorized big integer operations for cryptosystems on the Intel mic architecture. In: 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp. 194\u2013203 (2015). https:\/\/doi.org\/10.1109\/HiPC.2015.54","DOI":"10.1109\/HiPC.2015.54"},{"key":"256_CR4","unstructured":"Drucker, N., Gueron, S.: Fast modular squaring with AVX512IFMA. Cryptology ePrint Archive, Report 2018\/335 (2018). http:\/\/eprint.iacr.org\/2018\/335"},{"key":"256_CR5","doi-asserted-by":"publisher","unstructured":"Emmart, N., Luitjens, J., Weems, C., Woolley, C.: Optimizing modular multiplication for NVIDIA\u2019s Maxwell GPUs. In: 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp. 47\u201354 (2016). https:\/\/doi.org\/10.1109\/ARITH.2016.21","DOI":"10.1109\/ARITH.2016.21"},{"key":"256_CR6","doi-asserted-by":"publisher","unstructured":"Emmart, N., Weems, C.: Pushing the performance envelope of modular exponentiation across multiple generations of GPUs. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 166\u2013176 (2015). https:\/\/doi.org\/10.1109\/IPDPS.2015.69","DOI":"10.1109\/IPDPS.2015.69"},{"key":"256_CR7","doi-asserted-by":"publisher","unstructured":"Emmart, N., Zhengt, F., Weems, C.: Faster modular exponentiation using double precision floating point arithmetic on the GPU. In: 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH), pp. 130\u2013137 (2018). https:\/\/doi.org\/10.1109\/ARITH.2018.8464792","DOI":"10.1109\/ARITH.2018.8464792"},{"key":"256_CR8","unstructured":"Fog, A.: Instruction tables. Tech. rep., Technical University of Denmark (2018). https:\/\/www.agner.org\/optimize\/instruction_tables.pdf"},{"issue":"1","key":"256_CR9","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1007\/s13389-012-0031-5","volume":"2","author":"S Gueron","year":"2012","unstructured":"Gueron, S.: Efficient software implementations of modular exponentiation. J. Cryptogr. Eng. 2(1), 31\u201343 (2012). https:\/\/doi.org\/10.1007\/s13389-012-0031-5","journal-title":"J. Cryptogr. Eng."},{"key":"256_CR10","doi-asserted-by":"publisher","unstructured":"Gueron, S., Krasnov, V.: Software implementation of modular exponentiation, using advanced vector instructions architectures. In: Arithmetic of Finite Fields\u20144th International Workshop, WAIFI 2012, July 16\u201319, 2012, Lecture Notes in Computer Science, vol. 7369, pp. 119\u2013135. Springer (2012). https:\/\/doi.org\/10.1007\/978-3-642-31662-3_9","DOI":"10.1007\/978-3-642-31662-3_9"},{"key":"256_CR11","doi-asserted-by":"publisher","unstructured":"Gueron, S., Krasnov, V.: Accelerating big integer arithmetic using Intel IFMA extensions. In: 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp. 32\u201338 (2016). https:\/\/doi.org\/10.1109\/ARITH.2016.22","DOI":"10.1109\/ARITH.2016.22"},{"key":"256_CR12","unstructured":"Intel: Intel VTune Amplifier 2019 user guide. https:\/\/software.intel.com\/en-us\/vtune-amplifier-help. Accessed 05 June 2019"},{"key":"256_CR13","first-page":"293","volume":"145","author":"A Karatsuba","year":"1962","unstructured":"Karatsuba, A., Ofman, Y.: Multiplication of many-digital numbers by automatic computers. Proc. USSR Acad. Sci. 145, 293\u2013294 (1962)","journal-title":"Proc. USSR Acad. Sci."},{"issue":"3","key":"256_CR14","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/40.502403","volume":"16","author":"C Kaya Koc","year":"1996","unstructured":"Kaya Koc, C., Acar, T., Kaliski, B.S.: Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro 16(3), 26\u201333 (1996). https:\/\/doi.org\/10.1109\/40.502403","journal-title":"IEEE Micro"},{"key":"256_CR15","doi-asserted-by":"publisher","unstructured":"Keliris, A., Maniatakos, M.: Investigating large integer arithmetic on Intel Xeon Phi SIMD extensions. In: 2014 9th IEEE International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS), pp. 1\u20136 (2014). https:\/\/doi.org\/10.1109\/DTIS.2014.6850661","DOI":"10.1109\/DTIS.2014.6850661"},{"key":"256_CR16","doi-asserted-by":"publisher","unstructured":"Kleinjung, T., Aoki, K., Franke, J., Lenstra, A.K., Thom\u00e9, E., Bos, J.W., Gaudry, P., Kruppa, A., Montgomery, P.L., Osvik, D.A., te\u00a0Riele, H.J.J., Timofeev, A., Zimmermann, P.: Factorization of a 768-bit RSA modulus. In: Advances in Cryptology\u2014CRYPTO, August 15-19, 2010, Lecture Notes in Computer Science, vol. 6223, pp. 333\u2013350. Springer (2010). https:\/\/doi.org\/10.1007\/978-3-642-14623-7_18","DOI":"10.1007\/978-3-642-14623-7_18"},{"issue":"3","key":"256_CR17","doi-asserted-by":"publisher","first-page":"649","DOI":"10.2307\/1971363","volume":"126","author":"HW Lenstra","year":"1987","unstructured":"Lenstra, H.W.: Factoring integers with elliptic curves. Ann. Math. 126(3), 649\u2013673 (1987)","journal-title":"Ann. Math."},{"key":"256_CR18","volume-title":"Handbook of Applied Cryptography","author":"AJ Menezes","year":"2001","unstructured":"Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (2001)"},{"key":"256_CR19","doi-asserted-by":"publisher","first-page":"519","DOI":"10.1090\/S0025-5718-1985-0777282-X","volume":"44","author":"PL Montgomery","year":"1985","unstructured":"Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519\u2013521 (1985)","journal-title":"Math. Comput."},{"key":"256_CR20","unstructured":"MPI, O.: Open source high performance computing. https:\/\/www.open-mpi.org\/. Accessed 23 April 2019"},{"key":"256_CR21","unstructured":"OpenSSL: Cryptography and SSL\/TLS toolkit. http:\/\/www.openssl.org\/. Accessed 22 April 2019"},{"key":"256_CR22","unstructured":"Orisaka, G., Aranha, D.F., L\u00f3pez, J.F.A.: Finite field arithmetic using AVX-512 for isogeny-based cryptography. In: XVIII Simposio Brasileiro de Seguranca da Informacao e Sistemas Computacionais (SBSeg 2018), pp. 49\u201356 (2018)"},{"issue":"11","key":"256_CR23","doi-asserted-by":"publisher","first-page":"1474","DOI":"10.1109\/TC.2004.100","volume":"53","author":"D Page","year":"2004","unstructured":"Page, D., Smart, N.P.: Parallel cryptographic arithmetic using a redundant Montgomery representation. IEEE Trans. Comput. 53(11), 1474\u20131482 (2004). https:\/\/doi.org\/10.1109\/TC.2004.100","journal-title":"IEEE Trans. Comput."},{"key":"256_CR24","doi-asserted-by":"publisher","unstructured":"Rauzy, P., Guilley, S.: Countermeasures against high-order fault-injection attacks on CRT-RSA. In: 2014 Workshop on Fault Diagnosis and Tolerance in Cryptography, pp. 68\u201382 (2014). https:\/\/doi.org\/10.1109\/FDTC.2014.17","DOI":"10.1109\/FDTC.2014.17"},{"key":"256_CR25","unstructured":"Reinders, J.: Intel AVX-512 instructions. Tech. rep., INTEL (2017). https:\/\/software.intel.com\/en-us\/blogs\/2013\/avx-512-instructions"},{"issue":"2","key":"256_CR26","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1145\/359340.359342","volume":"21","author":"RL Rivest","year":"1978","unstructured":"Rivest, R.L., Shamir, A., Adleman, L.: A method of obtaining digital signature and public key cryptosystems. Commun. ACM 21(2), 120\u2013126 (1978)","journal-title":"Commun. ACM"},{"key":"256_CR27","unstructured":"Sidorenko, A., van\u00a0den Berg, J., Foekema, R., Grashuis, M., de\u00a0Vos, J.: Bellcore attack in practice. Cryptology ePrint Archive, Report 2012\/553 (2012). http:\/\/eprint.iacr.org\/2012\/553"},{"key":"256_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.parco.2018.02.002","volume":"75","author":"D Takahashi","year":"2018","unstructured":"Takahashi, D.: Computation of the 100 quadrillionth hexadecimal digit of $$\\pi $$ on a cluster of Intel Xeon Phi processors. Parallel Comput. 75, 1\u201310 (2018). https:\/\/doi.org\/10.1016\/j.parco.2018.02.002","journal-title":"Parallel Comput."},{"key":"256_CR29","doi-asserted-by":"crossref","unstructured":"Yarom, Y., Genkin, D., Heninger, N.: Cachebleed: A timing attack on OpenSSL constant time RSA. IACR Cryptology ePrint Archive 2016, 224 (2016). http:\/\/eprint.iacr.org\/2016\/224","DOI":"10.1007\/978-3-662-53140-2_17"},{"key":"256_CR30","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Pan, W., Lin, J., Liu, P., Xue, C., Zheng, F.: Phirsa: exploiting the computing power of vector instructions on Intel Xeon Phi for RSA. pp. 482\u2013500 (2017)","DOI":"10.1007\/978-3-319-69453-5_26"}],"container-title":["Journal of Cryptographic Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13389-021-00256-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13389-021-00256-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13389-021-00256-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,15]],"date-time":"2022-03-15T13:40:55Z","timestamp":1647351655000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13389-021-00256-9"}},"subtitle":["RSA fault-injection countermeasure via interleaved parallel multiplication"],"short-title":[],"issued":{"date-parts":[[2021,2,13]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["256"],"URL":"https:\/\/doi.org\/10.1007\/s13389-021-00256-9","relation":{},"ISSN":["2190-8508","2190-8516"],"issn-type":[{"value":"2190-8508","type":"print"},{"value":"2190-8516","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,13]]},"assertion":[{"value":"10 August 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 February 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}