{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T14:28:01Z","timestamp":1781533681905,"version":"3.54.5"},"reference-count":42,"publisher":"International Association for Cryptologic Research","license":[{"start":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T00:00:00Z","timestamp":1719878400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100018833","name":"Agence de l'innovation de d\u00e9fense","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100018833","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IACR CiC"],"accepted":{"date-parts":[[2024,9,2]]},"abstract":"<jats:p>\n                    This paper presents software implementations of batch computations, dealing with multi-precision integer operations. In this work, we use the Single Instruction Multiple Data (SIMD) AVX512 instruction set of the x86-64 processors, in particular the vectorized fused multiplier-adder VPMADD52. We focus on batch multiplications, squarings, modular multiplications, modular squarings and constant time modular exponentiations of 8 values using a word-slicing storage. We explore the use of Schoolbook and Karatsuba approaches with operands up to 4108 and 4154 bits respectively. We also introduce a truncated multiplication that speeds up the computation of the Montgomery modular reduction in the context of software implementation. Our Truncated Montgomery modular multiplication improvement offers speed gains of almost 20 % over the conventional non-truncated versions. Compared to the state-of-the-art GMP and OpenSSL libraries, our speedup modular operations are more than 4 times faster.  Compared to OpenSSL BN_mod_exp_mont_consttimex2 using AVX512 and madd52* (madd52hi or madd52lo) in 256-bit registers, in fixed-window exponentiations of sizes\n                    <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                      <mml:mrow>\n                        <mml:mn>1024<\/mml:mn>\n                      <\/mml:mrow>\n                    <\/mml:math>\n                    and\n                    <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                      <mml:mrow>\n                        <mml:mn>2048<\/mml:mn>\n                      <\/mml:mrow>\n                    <\/mml:math>\n                    , our 512-bit implementation provides speedups of respectively 1.75 and 1.38, while the 256-bit version speedups are 1.51 and 1.05 for\n                    <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                      <mml:mrow>\n                        <mml:mn>1024<\/mml:mn>\n                      <\/mml:mrow>\n                    <\/mml:math>\n                    and\n                    <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                      <mml:mrow>\n                        <mml:mn>2048<\/mml:mn>\n                      <\/mml:mrow>\n                    <\/mml:math>\n                    -bit sizes (batch of 4 values in this case).\n                  <\/jats:p>","DOI":"10.62056\/a3txl86bm","type":"journal-article","created":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T11:13:33Z","timestamp":1728299613000},"update-policy":"https:\/\/doi.org\/10.62056\/adfjwm02dj","source":"Crossref","is-referenced-by-count":1,"title":["Truncated multiplication and batch software SIMD AVX512 implementation for faster Montgomery multiplications and modular exponentiation"],"prefix":"10.62056","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-8658-0064","authenticated-orcid":false,"given":"Laurent-St\u00e9phane","family":"Didier","sequence":"first","affiliation":[{"id":[{"id":"https:\/\/ror.org\/038a20b58","id-type":"ROR","asserted-by":"publisher"}],"name":"Toulon","place":["83130, France"],"department":["Laboratoire IMath, Universit\u00e9 de Toulon"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3840-584X","authenticated-orcid":false,"given":"Nadia","family":"Mrabet","sequence":"additional","affiliation":[{"id":[{"id":"https:\/\/ror.org\/05a1dws80","id-type":"ROR","asserted-by":"publisher"}],"name":"Saint-Etienne","place":["42100, France"],"department":["Mines Saint-Etienne, CEA-LETI, Centre CMP, Department SAS"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0966-0503","authenticated-orcid":false,"given":"L\u00e9a","family":"Glandus","sequence":"additional","affiliation":[{"id":[{"id":"https:\/\/ror.org\/038a20b58","id-type":"ROR","asserted-by":"publisher"}],"name":"Toulon","place":["83130, France"],"department":["Laboratoire IMath, Universit\u00e9 de Toulon"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9634-5729","authenticated-orcid":false,"given":"Jean-Marc","family":"Robert","sequence":"additional","affiliation":[{"id":[{"id":"https:\/\/ror.org\/038a20b58","id-type":"ROR","asserted-by":"publisher"}],"name":"Toulon","place":["83130, France"],"department":["Laboratoire IMath, Universit\u00e9 de Toulon"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"48349","published-online":{"date-parts":[[2024,10,7]]},"reference":[{"key":"ref1:RivestSA78","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1145\/359340.359342","article-title":"A Method for Obtaining Digital Signatures and\n  Public-Key Cryptosystems","volume":"21","author":"R. L. Rivest","year":"1978","journal-title":"Commun. ACM"},{"key":"ref2:BruceShcheierThese","isbn-type":"print","volume-title":"Applied Cryptography - Protocols, Algorithms, and\n  Source Code in C, 2nd Edition","author":"B. Schneier","year":"1996","ISBN":"https:\/\/id.crossref.org\/isbn\/9780471117094"},{"key":"ref3:ECC99","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781107360211","volume-title":"Elliptic Curves in Cryptography","author":"I. F. Blake","year":"1999"},{"key":"ref4:sike","isbn-type":"print","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1007\/978-3-030-75245-3_2","article-title":"An Alternative Approach for SIDH Arithmetic","author":"C. Bouvier","year":"2021","ISBN":"https:\/\/id.crossref.org\/isbn\/9783030752453"},{"key":"ref5:gnu_mp","volume-title":"GNU Multiple Precision Arithmetic Library 6.1.2","author":"T. Granlund"},{"key":"ref6:gueron2012software","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1007\/978-3-642-31662-3_9","article-title":"Software Implementation of Modular Exponentiation,\n  using Advanced Vector Instructions Architectures","author":"S. Gueron","year":"2012"},{"key":"ref7:gueron2016accelerating","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1109\/ARITH.2016.22","article-title":"Accelerating Big Integer Arithmetic using Intel\n  IFMA Extensions","author":"S. Gueron","year":"2016"},{"key":"ref8:NDruckerGK18","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1109\/ARITH.2018.8464777","article-title":"Fast Multiplication of Binary Polynomials with the\n  Forthcoming Vectorized VPCLMULQDQ Instruction","author":"N. Drucker","year":"2018"},{"key":"ref9:bmsz13","isbn-type":"print","doi-asserted-by":"publisher","first-page":"471","DOI":"10.1007\/978-3-662-43414-7_24","article-title":"Montgomery Multiplication Using Vector\n  Instructions","author":"J. W. Bos","year":"2014","ISBN":"https:\/\/id.crossref.org\/isbn\/9783662434147"},{"key":"ref10:tg19","isbn-type":"print","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-030-14070-0_1","article-title":"Fast Modular Squaring with AVX512IFMA","author":"N. Drucker","year":"2019","ISBN":"https:\/\/id.crossref.org\/isbn\/9783030140700"},{"key":"ref11:et20","isbn-type":"print","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1007\/978-3-030-38991-8_5","article-title":"Accelerating Large Integer Multiplication Using\n  Intel AVX-512IFMA","author":"T. Edamatsu","year":"2020","ISBN":"https:\/\/id.crossref.org\/isbn\/9783030389918"},{"key":"ref12:tak20","isbn-type":"print","doi-asserted-by":"publisher","first-page":"655","DOI":"10.1007\/978-3-030-58814-4_52","article-title":"Fast Multiple Montgomery Multiplications Using Intel\n  AVX-512IFMA Instructions","author":"D. Takahashi","year":"2020","ISBN":"https:\/\/id.crossref.org\/isbn\/9783030588144"},{"key":"ref13:bernstein2009billion","first-page":"131","article-title":"The Billion-Mulmod-Per-Second PC","volume":"9","author":"D. J. Bernstein","year":"2009"},{"key":"ref14:trei2013efficient","volume-title":"Efficient Modular Arithmetic for SIMD Devices","author":"Wilke Trei","year":"2013"},{"key":"ref15:mahe2014fast","article-title":"Fast GPGPU-Based Elliptic Curve Scalar\n  Multiplication","author":"E. M Mah\u00e9","year":"2014","journal-title":"Cryptology ePrint Archive"},{"key":"ref16:emmart2016optimizing","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1109\/ARITH.2016.21","article-title":"Optimizing Modular Multiplication for Nvidia's\n  Maxwell GPUs","author":"N. Emmart","year":"2016"},{"key":"ref17:emmart2018faster","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1109\/ARITH.2018.8464792","article-title":"Faster Modular Exponentiation using Double Precision\n  Floating Point Arithmetic on the GPU","author":"N. Emmart","year":"2018"},{"key":"ref18:antao2010elliptic","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1109\/ASAP.2010.5541000","article-title":"Elliptic Curve Point Multiplication on GPUs","author":"S. Ant\u00e3o","year":"2010"},{"key":"ref19:bos2012low","doi-asserted-by":"publisher","first-page":"532","DOI":"10.1007\/s10766-012-0198-5","article-title":"Low-latency Elliptic Curve Scalar Multiplication","volume":"40","author":"J. W. Bos","year":"2012","journal-title":"International Journal of Parallel Programming"},{"key":"ref20:GrabherGP2008","volume-title":"On Software Parallel Implementation of Cryptographic\n  Pairings","author":"P. Grabher","year":"2008"},{"key":"ref21:buhrow2022parallel","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1007\/s13389-021-00256-9","article-title":"Parallel Modular Multiplication using 512-bit Advanced\n  Vector Instructions: RSA Fault-Injection Countermeasure via\n  Interleaved Parallel Multiplication","volume":"12","author":"B. Buhrow","year":"2022","journal-title":"Journal of Cryptographic Engineering"},{"key":"ref22:cheng2021batching","doi-asserted-by":"publisher","first-page":"618","DOI":"10.46586\/tches.v2021.i4.618-649","article-title":"Batching CSIDH Group Actions using AVX-512","volume":"2021","author":"H. Cheng","year":"2021","journal-title":"IACR Transactions on Cryptographic Hardware and Embedded\n  Systems (TCHES)"},{"key":"ref23:cheng2022highly","doi-asserted-by":"publisher","first-page":"41","DOI":"10.46586\/tches.v2022.i2.41-68","article-title":"Highly Vectorized SIKE for AVX-512","volume":"2022","author":"H. Cheng","year":"2022","journal-title":"IACR Transactions on Cryptographic Hardware and Embedded\n  Systems (TCHES)"},{"key":"ref24:openssl","volume-title":"OpenSSL","author":"The OpenSSL Project"},{"key":"ref25:intelavx10","volume-title":"Intel Advanced Vector extensions 10","author":"Architecture specification"},{"key":"ref26:barrett","series-title":"Lecture Notes in Computer Science","isbn-type":"print","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1007\/3-540-47721-7_24","article-title":"Implementing the Rivest Shamir and Adleman Public\n  Key Encryption Algorithm on a Standard Digital Signal\n  Processor","volume":"263","author":"P. Barrett","year":"1986","ISBN":"https:\/\/id.crossref.org\/isbn\/9783540180470","journal-title":"Advances in Cryptology \u2014 CRYPTO' 86"},{"key":"ref27:montMult_85","doi-asserted-by":"publisher","first-page":"519","DOI":"10.2307\/2007970","article-title":"Modular Multiplication Without Trial Division","volume":"44","author":"P. L. Montgomery","year":"1985","journal-title":"Mathematics of Computation"},{"key":"ref28:kocAT96","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/40.502403","article-title":"Analyzing and Comparing Montgomery Multiplication\n  Algorithms","volume":"16","author":"C. Kaya Koc","year":"1996","journal-title":"IEEE Micro"},{"key":"ref29:Hars2005","isbn-type":"print","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/11545262_16","article-title":"Fast Truncated Multiplication for Cryptographic\n  Applications","author":"L. Hars","year":"2005","ISBN":"https:\/\/id.crossref.org\/isbn\/9783540319405"},{"key":"ref30:Hars2006","doi-asserted-by":"publisher","first-page":"61721","DOI":"10.1155\/2007\/61721","article-title":"Applications of Fast Truncated Multiplication in\n  Cryptography","volume":"2007","author":"L. Hars","year":"2006","journal-title":"EURASIP Journal on Embedded Systems","ISSN":"https:\/\/id.crossref.org\/issn\/1687-3963","issn-type":"electronic"},{"key":"ref31:DingS2018","doi-asserted-by":"publisher","first-page":"1713","DOI":"10.1109\/TCSII.2017.2771239","article-title":"A Modular Multiplier Implemented With Truncated\n  Multiplication","volume":"65","author":"J. Ding","year":"2018","journal-title":"IEEE Transactions on Circuits and Systems II:\n  Express Briefs"},{"key":"ref32:DingS2020","doi-asserted-by":"publisher","first-page":"1319","DOI":"10.1109\/TCSII.2019.2932328","article-title":"A Low-Latency and Low-Cost Montgomery Modular\n  Multiplier Based on NLP Multiplication","volume":"67","author":"J. Ding","year":"2020","journal-title":"IEEE Transactions on Circuits and Systems II: Express\n  Briefs"},{"key":"ref33:BosKP2021","series-title":"Lecture note series","isbn-type":"print","doi-asserted-by":"crossref","DOI":"10.1017\/9781108854207","volume-title":"Computational Cryptography: Algorithmic Aspects of\n  Cryptology","author":"J. W. Bos","year":"2021","ISBN":"https:\/\/id.crossref.org\/isbn\/9781108795937"},{"key":"ref34:intelintrinsics","volume-title":"Intel Intrinsics Guide","author":"Intel"},{"key":"ref35:throttling2017","volume-title":"On the dangers of intel\u2019s frequency scaling","author":"V. Krasnov","year":"2017"},{"key":"ref36:RobertV22","doi-asserted-by":"publisher","DOI":"10.1007\/s13389-021-00278-3","article-title":"Faster Multiplication over ${\\mathbb {F}}_2[X]$ using\n  AVX512 Instruction Set and VPCLMULQDQ Instruction","author":"J. -M. Robert","year":"2022","journal-title":"Journal of Cryptographic Engineering"},{"key":"ref37:neonintrinsics","volume-title":"Neon Intrinsics Reference","author":"ARM"},{"key":"ref38:mrabetJ2017","series-title":"Chapman and Hall\/CRC Cryptography and Network Security\n  Series","isbn-type":"print","doi-asserted-by":"crossref","DOI":"10.1201\/9781315370170","volume-title":"Guide to Pairing-Based Cryptography","author":"N.E. Mrabet","year":"2017","ISBN":"https:\/\/id.crossref.org\/isbn\/9781498729512"},{"key":"ref39:DuquesneL2006","series-title":"Discrete Mathematics and Its Applications","isbn-type":"print","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1201\/9781420034981.ch24","volume-title":"Handbook of Elliptic and Hyperelliptic Curve Cryptography","author":"S. Duquesne","year":"2006","ISBN":"https:\/\/id.crossref.org\/isbn\/1584885181"},{"key":"ref40:FeoKLPW2020","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-64837-4_3","volume-title":"SQISign: compact post-quantum signatures from quaternions\n  and isogenies","author":"L. De Feo","year":"2020"},{"key":"ref41:CoronMNT2010","isbn-type":"print","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1007\/978-3-642-22792-9_28","article-title":"Fully Homomorphic Encryption over the Integers with Shorter\n  Public Keys","author":"J.-S. Coron","year":"2011","ISBN":"https:\/\/id.crossref.org\/isbn\/9783642227929"},{"key":"ref42:DyerDX2019","doi-asserted-by":"publisher","first-page":"549","DOI":"10.1007\/s10207-019-00427-0","article-title":"Practical homomorphic encryption over the integers for\n  secure computation in the cloud","volume":"18","author":"J. Dyer","year":"2019","journal-title":"International Journal of Information Security","ISSN":"https:\/\/id.crossref.org\/issn\/1615-5270","issn-type":"electronic"}],"container-title":["IACR Communications in Cryptology"],"original-title":[],"language":"en","deposited":{"date-parts":[[2024,12,10]],"date-time":"2024-12-10T16:28:12Z","timestamp":1733848092000},"score":1,"resource":{"primary":{"URL":"https:\/\/cic.iacr.org\/p\/1\/3\/11"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,7]]},"references-count":42,"URL":"https:\/\/doi.org\/10.62056\/a3txl86bm","archive":["Internet Archive","Internet Archive"],"relation":{},"ISSN":["3006-5496"],"issn-type":[{"value":"3006-5496","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,7]]},"assertion":[{"value":"2024-07-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-02","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"cc1-3-32"}}