{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T01:07:43Z","timestamp":1755220063022,"version":"3.43.0"},"reference-count":16,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGOPS Oper. Syst. Rev."],"published-print":{"date-parts":[[2025,8,4]]},"abstract":"<jats:p>Most of the commonly used compression standards make use of some form of the LZ algorithm. Decompressing this type of data is not a good match for the Single-Instruction, Multiple Thread (SIMT) model of computation used by GPUs, resulting in low throughput and poor utilization of the GPU parallel compute capabilities. In this paper, we introduce GSST, a GPU-optimized version of the FSST compression algorithm, which targets string compression. The optimizations proposed in this paper make the algorithm particularly suitable for GPUs, which allows it to achieve a significantly better tradeoff for decompression throughput vs compression ratio as compared to the state of the art. Our results show that the new algorithm pushes the Pareto curve closer towards the ideal region, completely dominating LZ-based compressors in the nvCOMP library (LZ4, Snappy, GDeflate). GSST provides a compression ratio of 2.7 4x and achieves a throughput of 191 GB\/s on an A100 GPu.<\/jats:p>","DOI":"10.1145\/3759441.3759450","type":"journal-article","created":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T14:43:44Z","timestamp":1754491424000},"page":"55-61","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["GSST: Parallel string decompression at 191 GB\/s on GPU"],"prefix":"10.1145","volume":"59","author":[{"given":"Robin","family":"Vonk","sequence":"first","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]},{"given":"Joost","family":"Hoozemans","sequence":"additional","affiliation":[{"name":"Voltron Data, worldwide remote, USA"}]},{"given":"Zaid","family":"AI-Ars","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}]}],"member":"320","published-online":{"date-parts":[[2025,8,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598587"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3662010.3663450"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407851"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW52791.2021.00035"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2019.00017"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1186\/1756-0500-4-261"},{"key":"e_1_2_1_7_1","volume-title":"Noh","author":"Kim Byungseok","year":"2017","unstructured":"Byungseok Kim, Jaeho Kim, and Sam H. Noh. 2017. Managing Array of SSDs When the Storage Device Is No Longer the Performance Bottleneck. In 9th USENIXWorkshop on Hot Topics in Storage and File Systems (HotStorage 17). USENIX Association, Santa Clara, CA. https:\/\/www. usenix.org\/conference\/hotstorage17\/program\/presentation\/kim"},{"key":"e_1_2_1_8_1","volume-title":"CPU Bandwidth - The Worrisome 2020 Trend. (23","author":"Kruger Fritz","year":"2016","unstructured":"Fritz Kruger. 2016. CPU Bandwidth - The Worrisome 2020 Trend. (23 March 2016). https:\/\/blog.westerndigital.com\/cpu-bandwidth-theworrisome- 2020-trend\/"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/PDP50117.2020.00045"},{"key":"e_1_2_1_10_1","unstructured":"NVIDIA. 2024. nvCOMP library. https:\/\/developer.nvidia.com\/nvcomp"},{"key":"e_1_2_1_11_1","unstructured":"NVIDIA. 2024. NVIDIA Blackwell Architecture Technical Brief. https: \/\/resources.nvidia.com\/en-us-blackwell-architecture"},{"key":"e_1_2_1_12_1","volume-title":"Michael Garland, Nikolay Sakharnykh, and Wen mei Hwu.","author":"Park Jeongmin","year":"2023","unstructured":"Jeongmin Park, Zaid Qureshi, Vikram Mailthody, AndrewGacek, Shunfan Shao, Mohammad AlMasri, Isaac Gelado, Jinjun Xiong, Chris Newburn, I hsin Chung, Michael Garland, Nikolay Sakharnykh, and Wen mei Hwu. 2023. CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs. arXiv:2307.03760 [cs.DC]"},{"key":"e_1_2_1_13_1","unstructured":"Sebastian Deorowicz. 2024. Silesia Compression Corpus. https: \/\/sun.aei.polsl.pl\/~sdeor\/index.php?page=silesia"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526132"},{"key":"e_1_2_1_15_1","unstructured":"Yury Uralsky. 2024. Accelerating Load Times for DirectX Games and Apps with GDeflate for DirectStorage. https:\/\/developer.nvidia.com\/blog\/accelerating-load-times-fordirectx- games-and-apps-with-gdeflate-for-directstorage\/"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tbench.2023"}],"container-title":["ACM SIGOPS Operating Systems Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3759441.3759450","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T19:50:42Z","timestamp":1754596242000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3759441.3759450"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,4]]},"references-count":16,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,8,4]]}},"alternative-id":["10.1145\/3759441.3759450"],"URL":"https:\/\/doi.org\/10.1145\/3759441.3759450","relation":{},"ISSN":["0163-5980"],"issn-type":[{"type":"print","value":"0163-5980"}],"subject":[],"published":{"date-parts":[[2025,8,4]]},"assertion":[{"value":"2025-08-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}