{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T02:04:09Z","timestamp":1768615449519,"version":"3.49.0"},"reference-count":0,"publisher":"Universitatsbibliothek der Ruhr-Universitat Bochum","issue":"1","license":[{"start":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T00:00:00Z","timestamp":1768521600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["TCHES"],"abstract":"<jats:p>Falcon, a NTRU-based digital signature algorithm, has been selected by NIST as one of the post-quantum cryptography (PQC) standards. Compared to verification, the signature generation of Falcon is relatively slow. One of the core operations in signature generation is discrete Gaussian sampling, which involves a component known as the BaseSampler. The BaseSampler accounts for up to 30% of the time required for signature generation, making it a significant performance bottleneck. This work aims to address this bottleneck.We design a vectorized version of the BaseSample and provide optimized implementations across six different instruction sets: SSE2, AVX2, AVX-512F, NEON, RISC-V Vector (RVV), and RV64IM. The AVX2 implementation, for instance, achieves an 8.4x speedup over prior work. Additionally, we optimize the FFT\/iFFT operations using RVV and RV64D. For the RVV implementation, we introduce a new method using strided load\/store instructions, with 4+4 and 4+5 layer merging strategies for Falcon-{512,1024}, respectively, resulting in a speedup of more than 4x. Finally, we present the results of our optimized implementations across eight different instruction sets for signature generation of Falcon. For instance, our AVX2, AVX- 512F, and RV64GCVB implementations achieve performance improvements of 23%, 36%, and 59%, respectively, for signature generation of Falcon-512.<\/jats:p>","DOI":"10.46586\/tches.v2026.i1.302-324","type":"journal-article","created":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T15:13:15Z","timestamp":1768576395000},"page":"302-324","source":"Crossref","is-referenced-by-count":0,"title":["Vectorized Falcon-Sign Implementations using SSE2, AVX2, AVX-512F, NEON, and RVV"],"prefix":"10.46586","volume":"2026","author":[{"given":"Jipeng","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Jiaheng","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"25480","published-online":{"date-parts":[[2026,1,16]]},"container-title":["IACR Transactions on Cryptographic Hardware and Embedded Systems"],"original-title":[],"link":[{"URL":"https:\/\/tches.iacr.org\/index.php\/TCHES\/article\/download\/12678\/12366","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/tches.iacr.org\/index.php\/TCHES\/article\/download\/12678\/12366","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T15:13:16Z","timestamp":1768576396000},"score":1,"resource":{"primary":{"URL":"https:\/\/tches.iacr.org\/index.php\/TCHES\/article\/view\/12678"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,16]]},"references-count":0,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1,16]]}},"URL":"https:\/\/doi.org\/10.46586\/tches.v2026.i1.302-324","relation":{},"ISSN":["2569-2925"],"issn-type":[{"value":"2569-2925","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,16]]}}}