{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T03:29:19Z","timestamp":1769830159296,"version":"3.49.0"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T00:00:00Z","timestamp":1643587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["CNS-1908507"],"award-info":[{"award-number":["CNS-1908507"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>We present BurstZ+, an accelerator platform that eliminates the communication bottleneck between PCIe-attached scientific computing accelerators and their host servers, via hardware-optimized compression. While accelerators such as GPUs and FPGAs provide enormous computing capabilities, their effectiveness quickly deteriorates once data is larger than its on-board memory capacity, and performance becomes limited by the communication bandwidth of moving data between the host memory and accelerator. Compression has not been very useful in solving this issue due to performance and efficiency issues of compressing floating point numbers, which scientific data often consists of. BurstZ+ is an FPGA-based prototype accelerator platform which addresses the bandwidth issue via a class of novel hardware-optimized floating point compression algorithm called ZFP-V. We demonstrate that BurstZ+ can completely remove the host-side communication bottleneck for accelerators, using multiple stencil kernels with a wide range of operational intensities. Evaluated against hand-optimized implementations of kernel accelerators of the same architecture, our single-pipeline BurstZ+ prototype outperforms an accelerator without compression by almost 4\u00d7, and even an accelerator with enough memory for the entire dataset by over 2\u00d7. Furthermore, the projected performance of BurstZ+ on a future, faster FPGA scales to almost 7\u00d7 that of the same accelerator without compression, whose performance is still limited by the PCIe bandwidth.<\/jats:p>","DOI":"10.1145\/3476831","type":"journal-article","created":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T15:40:27Z","timestamp":1643643627000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["BurstZ+: Eliminating The Communication Bottleneck of Scientific Computing Accelerators via Accelerated Compression"],"prefix":"10.1145","volume":"15","author":[{"given":"Gongjin","family":"Sun","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of California, Irvine"}]},{"given":"Seongyoung","family":"Kang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Kookmin University, Irvine"}]},{"given":"Sang-Woo","family":"Jun","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California, Irvine"}]}],"member":"320","published-online":{"date-parts":[[2022,1,31]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080216"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2009.38"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.5194\/gmd-9-4381-2016"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.963416"},{"key":"e_1_3_1_6_2","first-page":"1","volume-title":"Transactions on Large-Scale Data-and Knowledge-Centered Systems XV","author":"Bre\u00df Sebastian","year":"2014","unstructured":"Sebastian Bre\u00df, Max Heimel, Norbert Siegmund, Ladjel Bellatreche, and Gunter Saake. 2014. GPU-accelerated database systems: Survey and open challenges. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XV. Springer, 1\u201335."},{"key":"e_1_3_1_7_2","unstructured":"F. Cappello M. Ainsworth J. Bessac Martin Burtscher Jong Youl Choi E. Constantinescu S. Di H. Guo Peter Lindstrom and Ozan Tugluk. Accessed April 2020. Scientific Data Reduction Benchmarks. https:\/\/sdrbench.github.io\/."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2002.1058095"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240850"},{"key":"e_1_3_1_11_2","article-title":"Lz4: Extremely fast compression algorithm","author":"Collet Yann","year":"2013","unstructured":"Yann Collet et\u00a0al. 2013. Lz4: Extremely fast compression algorithm. code.google.com (2013).","journal-title":"code.google.com"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2488491"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/SAAHPC.2011.29"},{"key":"e_1_3_1_14_2","first-page":"219","article-title":"Auto-tuning stencil computations on multicore and accelerators","author":"Datta Kaushik","year":"2010","unstructured":"Kaushik Datta, Samuel Williams, Vasily Volkov, Jonathan Carter, Leonid Oliker, John Shalf, and Katherine Yelick. 2010. Auto-tuning stencil computations on multicore and accelerators. Scientific Computing on Multicore and Accelerators (2010), 219\u2013253.","journal-title":"Scientific Computing on Multicore and Accelerators"},{"key":"e_1_3_1_15_2","unstructured":"Ga\u00ebl Deest Nicolas Estibals Tomofumi Yuki Steven Derrien and Sanjay Rajopadhye. 2016. Towards scalable and efficient FPGA stencil accelerators."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056781"},{"key":"e_1_3_1_17_2","volume-title":"Data Compression of Climate Simulation Data","author":"Dennis John","year":"2013","unstructured":"John Dennis. 2013. Data Compression of Climate Simulation Data. https:\/\/www2.cisl.ucar.edu\/sites\/default\/files\/dennis-cas2k13.pdf."},{"key":"e_1_3_1_18_2","doi-asserted-by":"crossref","unstructured":"Peter Deutsch. 1996. DEFLATE compressed data format specification version 1.3. (1996).","DOI":"10.17487\/rfc1951"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.11"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1137\/18M1168832"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ReConFig.2013.6732318"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2016.61"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2014.6968747"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03770-2_27"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1137\/19M126904X"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1002\/mrm.24389"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.2172\/1463232"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/PCCC.2017.8280472"},{"key":"e_1_3_1_29_2","volume-title":"AN 870: Stencil Computation Reference Design","year":"2018","unstructured":"Intel. 2018. AN 870: Stencil Computation Reference Design. https:\/\/www.intel.com\/content\/www\/us\/en\/programmable\/documentation\/abw1532533443842.html."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2014.6903690"},{"key":"e_1_3_1_31_2","volume-title":"2020 The 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Jin Sian","year":"2020","unstructured":"Sian Jin, Pascal Grosset, Christopher Biwer, Jesus Pulido, Jiannan Tian, Dingwen Tao, and James Ahrens. 2020. Understanding GPU-based lossy compression for extreme-scale cosmological simulations. In 2020 The 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS)."},{"key":"e_1_3_1_32_2","first-page":"1","volume-title":"2020 IEEE\/ACM 4th International Workshop on Software Correctness for HPC Applications (Correctness)","author":"Joseph Vinu","year":"2020","unstructured":"Vinu Joseph, Nithin Chalapathi, Aditya Bhaskara, Ganesh Gopalakrishnan, Pavel Panchekha, and Mu Zhang. 2020. Correctness-preserving compression of datasets and neural network models. In 2020 IEEE\/ACM 4th International Workshop on Software Correctness for HPC Applications (Correctness). IEEE, 1\u20139."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICNC.2012.67"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/PDP.2015.51"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304053"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2017.09.011"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346458"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2006.143"},{"key":"e_1_3_1_39_2","article-title":"ZFP related projects","year":"2020","unstructured":"LLNL. Accessed April 2020. ZFP related projects. https:\/\/computing.llnl.gov\/projects\/floating-point-compression\/related-projects.","journal-title":"https:\/\/computing.llnl.gov\/projects\/floating-point-compression\/related-projects"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00044"},{"key":"e_1_3_1_41_2","first-page":"89","volume-title":"Proceedings of the 1st International Workshop on High-performance Stencil Computations, Vienna","author":"Maruyama Naoya","year":"2014","unstructured":"Naoya Maruyama and Takayuki Aoki. 2014. Optimizing stencil computations for NVIDIA Kepler GPUs. In Proceedings of the 1st International Workshop on High-performance Stencil Computations, Vienna. 89\u201395."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.Companion.2012.136"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-85729-455-5"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.2"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2012.6339257"},{"key":"e_1_3_1_46_2","volume-title":"LZO Real-time Data Compression Library","author":"Oberhumer Markus F. X. J.","year":"2017","unstructured":"Markus F. X. J. Oberhumer. 2017. LZO Real-time Data Compression Library. http:\/\/www.oberhumer.com\/opensource\/lzo\/."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/2927964.2927967"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.3390\/atmos10100578"},{"issue":"2","key":"e_1_3_1_49_2","first-page":"86","article-title":"Performance limits study of stencil codes on modern GPGPUs","volume":"6","author":"Pershin Ilya S,","year":"2019","unstructured":"Ilya S, Pershin, Vadim D. Levchenko, and Anastasia Y. Perepelkina. 2019. Performance limits study of stencil codes on modern GPGPUs. Supercomputing Frontiers and Innovations 6, 2 (2019), 86\u2013101.","journal-title":"Supercomputing Frontiers and Innovations"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2006.35"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.5555\/800075.802449"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/2884045.2884046"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.51"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/2641361.2641369"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2007.4439254"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2691770"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/1995896.1995922"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.97"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2013.6645580"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2011.05.025"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT47387.2019.00022"},{"key":"e_1_3_1_62_2","first-page":"1","volume-title":"Proceedings of the Workshop on Biomedicine in Computing: Systems, Architectures, and Circuits","author":"Szafaryn Lukasz G.","year":"2009","unstructured":"Lukasz G. Szafaryn, Kevin Skadron, and Jeffrey J. Saucerman. 2009. Experiences accelerating MATLAB systems biology applications. In Proceedings of the Workshop on Biomedicine in Computing: Systems, Architectures, and Circuits. 1\u20134."},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3053688"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2614981"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062185"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCC.2020.2992548"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1984.1659158"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2011.80"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00042"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2002.804276"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.5555\/3014904.3014951"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174248"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2018.00027"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476831","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3476831","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3476831","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:20Z","timestamp":1750268960000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476831"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,31]]},"references-count":72,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3476831"],"URL":"https:\/\/doi.org\/10.1145\/3476831","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,31]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}