{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T07:35:25Z","timestamp":1763105725504,"version":"3.37.3"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2020,5,28]],"date-time":"2020-05-28T00:00:00Z","timestamp":1590624000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,5,28]],"date-time":"2020-05-28T00:00:00Z","timestamp":1590624000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Sign Process Syst"],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To best leverage high-bandwidth storage and network technologies requires an improvement in the speed at which we can decompress data. We present a \u201crefine and recycle\u201d method applicable to LZ77-type decompressors that enables efficient high-bandwidth designs and present an implementation in reconfigurable logic. The method refines the write commands (for literal tokens) and read commands (for copy tokens) to a set of commands that target a single bank of block ram, and rather than performing all the dependency calculations saves logic by recycling (read) commands that return with an invalid result. A single \u201cSnappy\u201d decompressor implemented in reconfigurable logic leveraging this method is capable of processing multiple literal or copy tokens per cycle and achieves up to 7.2GB\/s, which can keep pace with an NVMe device. The proposed method is about an order of magnitude faster and an order of magnitude more power efficient than a state-of-the-art single-core software implementation. The logic and block ram resources required by the decompressor are sufficiently low so that a set of these decompressors can be implemented on a single FPGA of reasonable size to keep up with the bandwidth provided by the most recent interface technologies.<\/jats:p>","DOI":"10.1007\/s11265-020-01547-w","type":"journal-article","created":{"date-parts":[[2020,5,28]],"date-time":"2020-05-28T06:02:26Z","timestamp":1590645746000},"page":"931-947","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic"],"prefix":"10.1007","volume":"92","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1077-1859","authenticated-orcid":false,"given":"Jian","family":"Fang","sequence":"first","affiliation":[]},{"given":"Jianyu","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Jinho","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Zaid","family":"Al-Ars","sequence":"additional","affiliation":[]},{"given":"H. Peter","family":"Hofstee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,5,28]]},"reference":[{"key":"1547_CR1","unstructured":"lzbench. available: https:\/\/github.com\/inikep\/lzbench. Accessed: 2019-05-15."},{"key":"1547_CR2","unstructured":"Uf sparse matrix collection. available: https:\/\/www.cise.ufl.edu\/research\/sparse\/MM\/LAW\/hollywood-2009.tar.gz."},{"key":"1547_CR3","unstructured":"Zstandard. available: http:\/\/facebook.github.io\/zstd\/. Accessed: 2019-05-15."},{"key":"1547_CR4","unstructured":"Adler, M. (2015). pigz: A parallel implementation of gzip for modern multi-processor, multi-core machines. Jet Propulsion Laboratory."},{"key":"1547_CR5","unstructured":"Agarwal, K.B., Hofstee, H.P., Jamsek, D.A., & Martin, A.K. (2014). High bandwidth decompression of variable length encoded data streams. US Patent 8,824,569."},{"key":"1547_CR6","unstructured":"Alpha Data. (2018) ADM-PCIE-9V3 User Manual. available: https:\/\/www.alpha-data.com\/pdfs\/adm-pcie-9v3usermanual_v2_7.pdf. Accessed: 2019-05-15."},{"key":"1547_CR7","unstructured":"Apache: Apache ORC. https:\/\/orc.apache.org\/. Accessed: 2018-12-01."},{"key":"1547_CR8","unstructured":"Apache: Apache Parquet. http:\/\/parquet.apache.org\/. Accessed: 2018-12-01."},{"key":"1547_CR9","doi-asserted-by":"crossref","unstructured":"Bart\u00edk, M., Ubik, S., & Kubalik, P. (2015). LZ4 compression algorithm on FPGA. In 2015 IEEE International Conference on Electronics, Circuits, and Systems (ICECS), (pp 179\u2013182): IEEE.","DOI":"10.1109\/ICECS.2015.7440278"},{"key":"1547_CR10","doi-asserted-by":"crossref","unstructured":"Fang, J., Chen, J., Al-Ars, Z., Hofstee, P., & Hidders, J. (2018). A high-bandwidth Snappy decompressor in reconfigurable logic: work-in-progress. In Proceedings of the International Conference on Hardware\/Software Codesign and System Synthesis (pp. 16:1\u201316:2): IEEE Press.","DOI":"10.1109\/CODESISSS.2018.8525953"},{"key":"1547_CR11","doi-asserted-by":"crossref","unstructured":"Fang, J., Chen, J., Lee, J., Al-Ars, Z., & Hofstee, H. (2019). Refine and recycle: a method to increase decompression parallelism. In 2019 IEEE 30Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 272\u2013280): IEEE.","DOI":"10.1109\/ASAP.2019.00017"},{"issue":"1","key":"1547_CR12","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s00778-019-00581-w","volume":"29","author":"J Fang","year":"2020","unstructured":"Fang, J., Mulder, Y. T. B., Hidders, J., Lee, J., & Hofstee, H.P. (2020). In-memory database acceleration on FPGAs: a survey. The VLDB Journal, 29(1), 33\u201359. https:\/\/doi.org\/10.1007\/s00778-019-00581-w.","journal-title":"The VLDB Journal"},{"key":"1547_CR13","doi-asserted-by":"crossref","unstructured":"Fowers, J., Kim, J. Y., Burger, D., & Hauck, S. (2015). A scalable high-bandwidth architecture for lossless compression on fpgas. In 2015 IEEE 23rd annual international symposium on Field-programmable custom computing machines (FCCM) (pp. 52\u201359): IEEE.","DOI":"10.1109\/FCCM.2015.46"},{"key":"1547_CR14","unstructured":"Gilchrist, J. (2004). Parallel data compression with bzip2. In Proceedings of the 16th IASTED international conference on parallel and distributed computing and systems (vol. 16, pp. 559\u2013564)."},{"key":"1547_CR15","unstructured":"Google: Snappy. https:\/\/github.com\/google\/snappy\/. Accessed: 2018-12-01."},{"key":"1547_CR16","unstructured":"Gopal, V., Gulley, S.M., & Guilford, J.D. (2017). Technologies for efficient LZ77-based data decompression. US Patent App. 15\/374,462."},{"key":"1547_CR17","unstructured":"Huebner, M., Ullmann, M., Weissel, F., & Becker, J. (2004). Real-time configuration code decompression for dynamic fpga self-reconfiguration. In 2004. Proceedings. 18th international Parallel and distributed processing symposium (pp. 138): IEEE."},{"key":"1547_CR18","unstructured":"Inc., C. (2016). ZipAccel-D GUNZIP\/ZLIB\/Inflate Data Decompression Core. http:\/\/www.cast-inc.com\/ip-cores\/data\/zipaccel-d\/cast-zipaccel-d-x.pdf. Accessed: 2019-03-01."},{"key":"1547_CR19","doi-asserted-by":"crossref","unstructured":"Jang, H., Kim, C., & Lee, J. W. (2013). Practical speculative parallelization of variable-length decompression algorithms. In ACM SIGPLAN Notices (vol. 48, pp. 55\u201364): ACM.","DOI":"10.1145\/2499369.2465557"},{"issue":"2","key":"1547_CR20","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1145\/1534916.1534919","volume":"2","author":"D Koch","year":"2009","unstructured":"Koch, D., Beckhoff, C., & Teich, J. (2009). Hardware decompression techniques for FPGA-based embedded systems. ACM Transactions on Reconfigurable Technology and Systems, 2(2), 9.","journal-title":"ACM Transactions on Reconfigurable Technology and Systems"},{"key":"1547_CR21","unstructured":"Leibson, S., & Mehta, N. (2013). Xilinx ultrascale: The next-generation architecture for your next-generation architecture. Xilinx White Paper WP435."},{"key":"1547_CR22","unstructured":"Mahoney, M. (2011). Large text compression benchmark available: http:\/\/www.mattmahoney.net\/text\/text.html."},{"key":"1547_CR23","unstructured":"Mahony, A.O., Tringale, A., Duquette, J.J., & O\u2019carroll, P. (2018). Reduction of execution stalls of LZ4 decompression via parallelization. US Patent 9,973,210."},{"key":"1547_CR24","doi-asserted-by":"crossref","unstructured":"Qiao, W., Du, J., Fang, Z., Lo, M., Chang, M. C. F., & Cong, J. (2018). High-Throughput lossless compression on tightly coupled CPU-FPGA platforms. In Proceedings of the 2018 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 291\u2013291): ACM.","DOI":"10.1145\/3174243.3174987"},{"key":"1547_CR25","unstructured":"Qiao, Y. (2018). An FPGA-based Snappy Decompressor-Filter. Master\u2019s thesis, Delft University of Technology."},{"key":"1547_CR26","doi-asserted-by":"crossref","unstructured":"Sitaridi, E., Mueller, R., Kaldewey, T., Lohman, G., & Ross, K. A. (2016). Massively-parallel lossless data decompression. In Proceedings of the international conference on parallel processing (pp. 242\u2013247): IEEE.","DOI":"10.1109\/ICPP.2016.35"},{"key":"1547_CR27","unstructured":"Stuecheli, J. A new standard for high performance memory, acceleration and networks. http:\/\/opencapi.org\/2017\/04\/opencapi-new-standard-high-performance-memory-acceleration-networks\/http:\/\/opencapi.org\/2017\/04\/opencapi-new-standard-high-performance-memory-acceleration-networks\/. Accessed: 2018-06-03."},{"key":"1547_CR28","unstructured":"Xilinx: Vitis Data Compression Library. https:\/\/xilinx.github.io\/Vitis_Libraries\/data_compression\/source\/results.html. Accessed: 2020-02-15."},{"issue":"10","key":"1547_CR29","doi-asserted-by":"publisher","first-page":"2842","DOI":"10.1109\/TVLSI.2017.2713527","volume":"25","author":"J Yan","year":"2017","unstructured":"Yan, J., Yuan, J., Leong, P. H., Luk, W., & Wang, L. (2017). Lossless compression decoders for bitstreams and software binaries based on high-level synthesis. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(10), 2842\u2013 2855.","journal-title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems"},{"key":"1547_CR30","doi-asserted-by":"crossref","unstructured":"Zhou, X., Ito, Y., & Nakano, K. (2016). An efficient implementation of LZW decompression in the FPGA. In 2016 IEEE International parallel and distributed processing symposium workshops (IPDPSW) (pp. 599\u2013607): IEEE.","DOI":"10.1109\/IPDPSW.2016.33"},{"issue":"3","key":"1547_CR31","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1109\/TIT.1977.1055714","volume":"23","author":"J Ziv","year":"1977","unstructured":"Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on information theory, 23(3), 337\u2013343.","journal-title":"IEEE Transactions on information theory"}],"container-title":["Journal of Signal Processing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-020-01547-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11265-020-01547-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-020-01547-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,28]],"date-time":"2021-05-28T00:21:48Z","timestamp":1622161308000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11265-020-01547-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,28]]},"references-count":31,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["1547"],"URL":"https:\/\/doi.org\/10.1007\/s11265-020-01547-w","relation":{},"ISSN":["1939-8018","1939-8115"],"issn-type":[{"type":"print","value":"1939-8018"},{"type":"electronic","value":"1939-8115"}],"subject":[],"published":{"date-parts":[[2020,5,28]]},"assertion":[{"value":"2 December 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 March 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 May 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 May 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}