{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T22:11:22Z","timestamp":1744150282862},"reference-count":0,"publisher":"Universitatsbibliothek der Ruhr-Universitat Bochum","license":[{"start":{"date-parts":[[2022,8,31]],"date-time":"2022-08-31T00:00:00Z","timestamp":1661904000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["TCHES"],"abstract":"<jats:p>The extended GCD (XGCD) calculation, which computes B\u00e9zout coefficients ba, bb such that ba \u2217 a0 + bb \u2217 b0 = GCD(a0, b0), is a critical operation in many cryptographic applications. In particular, large-integer XGCD is computationally dominant for two applications of increasing interest: verifiable delay functions that square binary quadratic forms within a class group and constant-time modular inversion for elliptic curve cryptography. Most prior work has focused on fast software implementations. The few works investigating hardware acceleration build on variants of Euclid\u2019s division-based algorithm, following the approach used in optimized software. We show that adopting variants of Stein\u2019s subtraction-based algorithm instead leads to significantly faster hardware. We quantify this advantage by performing a large-integer XGCD accelerator design space exploration comparing Euclid- and Stein-based algorithms for various application requirements. This exploration leads us to an XGCD hardware accelerator that is flexible and efficient, supports fast average and constant-time evaluation, and is easily extensible for polynomial GCD. Our 16nm ASIC design calculates 1024-bit XGCD in 294ns (8x faster than the state-of-the-art ASIC) and constant-time 255-bit XGCD for inverses in the field of integers modulo the prime 2255\u221219 in 85ns (31\u00d7 faster than state-of-the-art software). We believe our design is the first high-performance ASIC for the XGCD computation that is also capable of constant-time evaluation. Our work is publicly available at https:\/\/github.com\/kavyasreedhar\/sreedhar-xgcd-hardware-ches2022.<\/jats:p>","DOI":"10.46586\/tches.v2022.i4.163-187","type":"journal-article","created":{"date-parts":[[2022,9,1]],"date-time":"2022-09-01T05:55:26Z","timestamp":1662011726000},"page":"163-187","source":"Crossref","is-referenced-by-count":5,"title":["Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion"],"prefix":"10.46586","author":[{"given":"Kavya","family":"Sreedhar","sequence":"first","affiliation":[]},{"given":"Mark","family":"Horowitz","sequence":"additional","affiliation":[]},{"given":"Christopher","family":"Torng","sequence":"additional","affiliation":[]}],"member":"25480","published-online":{"date-parts":[[2022,8,31]]},"container-title":["IACR Transactions on Cryptographic Hardware and Embedded Systems"],"original-title":[],"link":[{"URL":"https:\/\/tches.iacr.org\/index.php\/TCHES\/article\/download\/9817\/9322","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/tches.iacr.org\/index.php\/TCHES\/article\/download\/9817\/9322","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,1]],"date-time":"2022-09-01T05:55:27Z","timestamp":1662011727000},"score":1,"resource":{"primary":{"URL":"https:\/\/tches.iacr.org\/index.php\/TCHES\/article\/view\/9817"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,31]]},"references-count":0,"URL":"https:\/\/doi.org\/10.46586\/tches.v2022.i4.163-187","relation":{},"ISSN":["2569-2925"],"issn-type":[{"value":"2569-2925","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,31]]}}}