{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T14:18:42Z","timestamp":1772893122438,"version":"3.50.1"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T00:00:00Z","timestamp":1670544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,12,31]]},"abstract":"<jats:p>\n            FPGA has been an emerging computing infrastructure in datacenters benefiting from fine-grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing has attracted tremendous interest in data analytics, and its performance is in increasing demand with the rapid growth of data. Many works have been proposed to tackle the challenges of designing efficient FPGA-based accelerators for graph processing. However, the largely overlooked programmability still requires hardware design expertise and sizable development efforts from developers.\n            <jats:italic>ThunderGP<\/jats:italic>\n            , a high-level synthesis based graph processing framework on FPGAs, is hence proposed to close the gap, with which developers could enjoy high performance of FPGA-accelerated graph processing by writing only a few high-level functions with no knowledge of the hardware. ThunderGP adopts the gather-apply-scatter model as the abstraction of various graph algorithms and realizes the model by a built-in highly parallel and memory-efficient accelerator template. With high-level functions as inputs, ThunderGP automatically explores massive resources of multiple super-logic regions of modern FPGA platforms to generate and deploy accelerators, as well as schedule tasks for them. Although ThunderGP on DRAM-based platforms is memory bandwidth bounded, recent high bandwidth memory (HBM) brings large potentials to performance. However, the system bottleneck shifts from memory bandwidth to resource consumption on HBM-enabled platforms. Therefore, we further propose to improve resource efficiency of ThunderGP to utilize more memory bandwidth from HBM. We conduct evaluation with seven common graph applications and 19 graphs. ThunderGP on DRAM-based hardware platforms provides 1.9\u00d7 \u223c 5.2\u00d7 improvement on bandwidth efficiency over the state of the art, whereas ThunderGP on HBM-based hardware platforms delivers up to 5.2\u00d7 speedup over the state-of-the-art RTL-based approach. This work is open sourced on GitHub at\u00a0https:\/\/github.com\/Xtra-Computing\/ThunderGP\n.\n          <\/jats:p>","DOI":"10.1145\/3517141","type":"journal-article","created":{"date-parts":[[2022,3,5]],"date-time":"2022-03-05T14:06:41Z","timestamp":1646489201000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS"],"prefix":"10.1145","volume":"15","author":[{"given":"Xinyu","family":"Chen","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feng","family":"Cheng","sequence":"additional","affiliation":[{"name":"National University of Singapore and City University of Hong Kong, Kowloon Tong, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongshi","family":"Tan","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yao","family":"Chen","sequence":"additional","affiliation":[{"name":"Advanced Digital Sciences Center, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bingsheng","family":"He","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weng-Fai","family":"Wong","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Deming","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana\u2013Champaign, Illinois, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,12,9]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Alibaba Cloud","year":"2020","unstructured":"Alibaba. 2020. Alibaba Cloud. Retrieved March 8, 2022 from https:\/\/www.alibabacloud.com\/.","journal-title":"https:\/\/www.alibabacloud.com\/."},{"key":"e_1_3_1_3_2","article-title":"Amazon F1 Cloud","year":"2020","unstructured":"Amazon. 2020. Amazon F1 Cloud. Retrieved March 8, 2022 from https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/.","journal-title":"https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00054"},{"key":"e_1_3_1_5_2","article-title":"Graph processing on FPGAs: Taxonomy, survey, challenges","author":"Besta Maciej","year":"2019","unstructured":"Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, and Torsten Hoefler. 2019. Graph processing on FPGAs: Taxonomy, survey, challenges. arXiv preprint arXiv:1903.06697 (2019).","journal-title":"arXiv preprint arXiv:1903.06697"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3203217.3203267"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cps.2016.0020"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2597917.2597929"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00014"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2019.00020"},{"key":"e_1_3_1_11_2","volume-title":"Proceedings of the Conference on Innovative Data Systems Research (CIDR\u201920)","author":"Chen Xinyu","year":"2020","unstructured":"Xinyu Chen, Yao Chen, Ronak Bajaj, Jiong He, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2020. Is FPGA useful for hash joins? In Proceedings of the Conference on Innovative Data Systems Research (CIDR\u201920)."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586184"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439290"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2015.2497259"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293915"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS47774.2020.00120"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439301"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC.2018.8465940"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062208"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847339"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021739"},{"issue":"1","key":"e_1_3_1_22_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2049662.2049663","article-title":"The University of Florida sparse matrix collection","volume":"38","author":"Davis Timothy A.","year":"2011","unstructured":"Timothy A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software 38, 1 (2011), 1\u201325.","journal-title":"ACM Transactions on Mathematical Software"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2016.7577360"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357596"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3317802"},{"key":"e_1_3_1_26_2","unstructured":"Joseph E. Gonzalez Yucheng Low Haijie Gu Danny Bickson and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912) . 17\u201330."},{"key":"e_1_3_1_27_2","unstructured":"Graph 500. 2020. Home Page. Retrieved March 8 2022 from https:\/\/graph500.org\/."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-019-1914-z"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD51958.2021.9643582"},{"key":"e_1_3_1_30_2","article-title":"Shuhai: A tool for benchmarking highbandwidth memory on FPGAs","author":"Huang Hongjing","year":"2021","unstructured":"Hongjing Huang, Zeke Wang, Jie Zhang, Zhenhao He, Chao Wu, Jun Xiao, and Gustavo Alonso. 2021. Shuhai: A tool for benchmarking highbandwidth memory on FPGAs. IEEE Transactions on Computers. Early access, April 28, 2021.","journal-title":"IEEE Transactions on Computers."},{"key":"e_1_3_1_31_2","article-title":"Intel FPGA SDK for OpenCL Pro Edition: Programming Guide","year":"2020","unstructured":"Intel. 2020. Intel FPGA SDK for OpenCL Pro Edition: Programming Guide. Retrieved March 8, 2022 from https:\/\/www.intel.com\/content\/www\/us\/en\/programmable\/documentation\/mwh1391807965224.html.","journal-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/programmable\/documentation\/mwh1391807965224.html."},{"key":"e_1_3_1_32_2","article-title":"Intel\u00ae Stratix\u00ae 10 FPGAs","year":"2021","unstructured":"Intel. 2021. Intel\u00ae Stratix\u00ae 10 FPGAs. Retrieved March 8, 2022 from https:\/\/www.intel.sg\/content\/www\/xa\/en\/products\/details\/fpga\/stratix\/10.html.","journal-title":"https:\/\/www.intel.sg\/content\/www\/xa\/en\/products\/details\/fpga\/stratix\/10.html."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL50879.2020.00013"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1007\/978-3-540-24777-7_9","volume-title":"Multidimensional knapsack problems","author":"Kellerer Hans","year":"2004","unstructured":"Hans Kellerer, Ulrich Pferschy, and David Pisinger. 2004. Multidimensional knapsack problems. In Knapsack Problems. Springer, 235\u2013283."},{"issue":"2","key":"e_1_3_1_35_2","article-title":"Kronecker graphs: An approach to modeling networks.","volume":"11","author":"Leskovec Jure","year":"2010","unstructured":"Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11, 2 (2010), 985\u20131042.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_36_2","article-title":"SNAP Datasets: Stanford Large Network Dataset Collection","author":"Leskovec Jure","year":"2014","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved March 8, 2022 from http:\/\/snap.stanford.edu\/data.","journal-title":"http:\/\/snap.stanford.edu\/data."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080228"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT47387.2019.00056"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439463"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847274"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626407002843"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2513673"},{"key":"e_1_3_1_44_2","article-title":"Nimbix Cloud Computing","year":"2020","unstructured":"Nimbix. 2020. Nimbix Cloud Computing. Retrieved March 8, 2022 from https:\/\/www.nimbix.net\/.","journal-title":"https:\/\/www.nimbix.net\/."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2014.15"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847337"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.proeng.2012.09.545"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847343"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v29i1.9277"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/2815400.2815408"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2018.00011"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASICON.2011.6157401"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293900"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/356887.356892"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378450"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447818.3461664"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1201\/9780429399602"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00021"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062251"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446058"},{"key":"e_1_3_1_61_2","unstructured":"Mike Wissolik Darren Zacher Anthony Torza and Brandon Da. 2017. Virtex UltraScale+ HBM FPGA: A Revolutionary Increase in Memory Performance . White Paper. Xilinx."},{"key":"e_1_3_1_62_2","article-title":"Vivado Design Suite\u2014 Vivado AXI Reference Guide","year":"2017","unstructured":"Xilinx. 2017. Vivado Design Suite\u2014 Vivado AXI Reference Guide. Retrieved March 8, 2022 from https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/axi_ref_guide\/latest\/ug1037-vivado-axi-reference-guide.pdf.","journal-title":"https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/axi_ref_guide\/latest\/ug1037-vivado-axi-reference-guide.pdf."},{"key":"e_1_3_1_63_2","article-title":"Large FPGA Methodology Guide","year":"2020","unstructured":"Xilinx. 2020. Large FPGA Methodology Guide. Retrieved March 8, 2022 from https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx14_7\/ug872_largefpga.pdf.","journal-title":"https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx14_7\/ug872_largefpga.pdf."},{"key":"e_1_3_1_64_2","unstructured":"Xilinx. 2020. SDAccel: Enabling Hardware-Accelerated Software. Retrieved March 8 2022 from https:\/\/www.xilinx.com\/products\/design-tools\/legacy-tools\/sdaccel.html."},{"key":"e_1_3_1_65_2","article-title":"UltraScale Architecture Memory Resources","year":"2020","unstructured":"Xilinx. 2020. UltraScale Architecture Memory Resources. Retrieved March 8, 2022 from https:\/\/www.xilinx.com\/support\/documentation\/user_guides\/ug573-ultrascale-memory-resources.pdf.","journal-title":"https:\/\/www.xilinx.com\/support\/documentation\/user_guides\/ug573-ultrascale-memory-resources.pdf."},{"key":"e_1_3_1_66_2","article-title":"Xilinx Adaptive Compute Cluster (XACC) Program","year":"2020","unstructured":"Xilinx. 2020. Xilinx Adaptive Compute Cluster (XACC) Program. Retrieved March 8, 2022 from https:\/\/www.xilinx.com\/support\/university\/XUP-XACC.html.","journal-title":"https:\/\/www.xilinx.com\/support\/university\/XUP-XACC.html."},{"key":"e_1_3_1_67_2","article-title":"Xilinx Runtime Library (XRT)","year":"2020","unstructured":"Xilinx. 2020. Xilinx Runtime Library (XRT). Retrieved March 8, 2022 from https:\/\/github.com\/Xilinx\/XRT.","journal-title":"https:\/\/github.com\/Xilinx\/XRT."},{"key":"e_1_3_1_68_2","article-title":"Alveo U280 Data Center Accelerator Card: User Guide","year":"2021","unstructured":"Xilinx. 2021. Alveo U280 Data Center Accelerator Card: User Guide. Retrieved March 8, 2022 from https:\/\/www.mouser.com\/pdfDocs\/u280userguide.pdf.","journal-title":"https:\/\/www.mouser.com\/pdfDocs\/u280userguide.pdf."},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT51103.2020.00028"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243201"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ReConFig.2015.7393332"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2016.35"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2910068"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/3203217.3203233"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2017.25"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3517141","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3517141","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:02Z","timestamp":1750183742000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3517141"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,9]]},"references-count":74,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,31]]}},"alternative-id":["10.1145\/3517141"],"URL":"https:\/\/doi.org\/10.1145\/3517141","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,9]]},"assertion":[{"value":"2021-08-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-02-05","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}