{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T17:10:33Z","timestamp":1773249033473,"version":"3.50.1"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,3,13]],"date-time":"2024-03-13T00:00:00Z","timestamp":1710288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels.<\/jats:p>\n          <jats:p>In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. GraphScale combines multi-channel memory with asynchronous graph processing (i.e., for fast convergence on results) and a compressed graph representation (i.e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design. Additionally, we extend GraphScale to scale to modern high-bandwidth memory (HBM) and reduce partitioning overhead of large graphs with binary packing.<\/jats:p>","DOI":"10.1145\/3616497","type":"journal-article","created":{"date-parts":[[2023,9,13]],"date-time":"2023-09-13T12:11:46Z","timestamp":1694607106000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6706-0353","authenticated-orcid":false,"given":"Jonas","family":"Dann","sequence":"first","affiliation":[{"name":"SAP SE, Walldorf, Germany, and Heidelberg University, Heidelberg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6146-3365","authenticated-orcid":false,"given":"Daniel","family":"Ritter","sequence":"additional","affiliation":[{"name":"SAP SE, Walldorf, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9562-0680","authenticated-orcid":false,"given":"Holger","family":"Fr\u00f6ning","sequence":"additional","affiliation":[{"name":"Heidelberg University, Heidelberg, Germany"}]}],"member":"320","published-online":{"date-parts":[[2024,3,13]]},"reference":[{"key":"e_1_3_3_2_2","first-page":"105","volume-title":"ISCA","author":"Ahn Junwhan","year":"2015","unstructured":"Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA. ACM, 105\u2013117."},{"key":"e_1_3_3_3_2","doi-asserted-by":"crossref","unstructured":"Vo Ngoc Anh and Alistair Moffat. 2010. Index compression using 64-bit words. Software: Practice and Experience 40 2 (2010) 131\u2013147.","DOI":"10.1002\/spe.948"},{"key":"e_1_3_3_4_2","first-page":"228","volume-title":"IPDPS","author":"Attia Osama G.","year":"2014","unstructured":"Osama G. Attia, Tyler Johnson, Kevin Townsend, Phillip H. Jones, and Joseph Zambreno. 2014. CyGraph: A reconfigurable architecture for parallel breadth-first search. In IPDPS. 228\u2013235."},{"key":"e_1_3_3_5_2","first-page":"203","volume-title":"IISWC","author":"Balaji Vignesh","year":"2018","unstructured":"Vignesh Balaji and Brandon Lucia. 2018. When is graph reordering an optimization? Studying the effect of lightweight graph reordering across applications and input graphs. In IISWC. IEEE Computer Society, 203\u2013214."},{"key":"e_1_3_3_6_2","article-title":"Demystifying graph databases: Analysis and taxonomy of data organization, system designs, and graph queries","volume":"1910","author":"Besta Maciej","year":"2019","unstructured":"Maciej Besta, Emanuel Peter, Robert Gerstenberger, Marc Fischer, Michal Podstawski, Claude Barthels, Gustavo Alonso, and Torsten Hoefler. 2019. Demystifying graph databases: Analysis and taxonomy of data organization, system designs, and graph queries. CoRR abs\/1910.09017 (2019).","journal-title":"CoRR"},{"key":"e_1_3_3_7_2","first-page":"8","volume-title":"ASAP","author":"Betkaoui Brahim","year":"2012","unstructured":"Brahim Betkaoui, Yu Wang, David B. Thomas, and Wayne Luk. 2012. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In ASAP. 8\u201315."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3517141"},{"key":"e_1_3_3_9_2","first-page":"69","volume-title":"FPGA","author":"Chen Xinyu","year":"2021","unstructured":"Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2021. ThunderGP: HLS-based graph processing framework on FPGAs. In FPGA. 69\u201380."},{"key":"e_1_3_3_10_2","first-page":"217","volume-title":"FPGA","author":"Dai Guohao","year":"2017","unstructured":"Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In FPGA. 217\u2013226."},{"key":"e_1_3_3_11_2","first-page":"3:1\u20133:10","volume-title":"GRADES-NDA","author":"Dann Jonas","year":"2021","unstructured":"Jonas Dann, Daniel Ritter, and Holger Fr\u00f6ning. 2021. Demystifying memory access patterns of FPGA-based graph processing accelerators. In GRADES-NDA. ACM, 3:1\u20133:10."},{"key":"e_1_3_3_12_2","first-page":"101","volume-title":"BTW","author":"Dann Jonas","year":"2021","unstructured":"Jonas Dann, Daniel Ritter, and Holger Fr\u00f6ning. 2021. Exploring memory access patterns for graph processing accelerators. In BTW. 101\u2013122."},{"key":"e_1_3_3_13_2","first-page":"24","volume-title":"FPL","author":"Dann Jonas","year":"2022","unstructured":"Jonas Dann, Daniel Ritter, and Holger Fr\u00f6ning. 2022. GraphScale: Scalable bandwidth-efficient graph processing on FPGAs. In FPL. 24\u201332."},{"key":"e_1_3_3_14_2","doi-asserted-by":"crossref","unstructured":"Jonas Dann Daniel Ritter and Holger Fr\u00f6ning. 2023. Non-relational databases on FPGAs: Survey design decisions challenges. ACM Computing Surveys 55 11 (2023) 225:1\u2013225:37.","DOI":"10.1145\/3568990"},{"key":"e_1_3_3_15_2","first-page":"152","volume-title":"IPDPS","author":"Holzinger Philipp","year":"2021","unstructured":"Philipp Holzinger, Daniel Reiser, Tobias Hahn, and Marc Reichenbach. 2021. Fast HBM access with FPGAs: Analysis, architectures, and applications. In IPDPS. 152\u2013159."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/JRPROC.1952.273898"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.892708"},{"key":"e_1_3_3_18_2","volume-title":"IRACST-ESTIJ","author":"Lei Guoqing","year":"2015","unstructured":"Guoqing Lei, Li Rong-chun, and Song Guo. 2015. TorusBFS: A novel message-passing parallel breadth-first search architecture on FPGAs. In IRACST-ESTIJ, Vol. 5, 313\u2013318."},{"key":"e_1_3_3_19_2","article-title":"SNAP Datasets: Stanford Large Network Dataset Collection","author":"Leskovec Jure","year":"2014","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data","journal-title":"http:\/\/snap.stanford.edu\/data"},{"key":"e_1_3_3_20_2","first-page":"147","volume-title":"FPGA","author":"Liu Chenhao","year":"2021","unstructured":"Chenhao Liu, Zhiyuan Shao, Kexin Li, Minkang Wu, Jiajie Chen, Ruoshi Li, Xiaofei Liao, and Hai Jin. 2021. ScalaBFS: A scalable BFS accelerator on FPGA-HBM platform. In FPGA. ACM, 147."},{"key":"e_1_3_3_21_2","doi-asserted-by":"crossref","unstructured":"Andrew Lumsdaine Douglas Gregor Bruce Hendrickson and Jonathan Berry. 2007. Challenges in parallel graph processing. Parallel Processing Letters 17 01 (2007) 5\u201320.","DOI":"10.1142\/S0129626407002843"},{"key":"e_1_3_3_22_2","doi-asserted-by":"crossref","unstructured":"Alistair Moffat and Lang Stuiver. 2000. Binary interpolative coding for effective index compression. Information Retrieval 3 1 (2000) 25\u201347.","DOI":"10.1023\/A:1013002601898"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCOM.1971.1090789"},{"key":"e_1_3_3_24_2","volume-title":"AAAIhttp:\/\/networkrepository.com","author":"Rossi Ryan A.","year":"2015","unstructured":"Ryan A. Rossi and Nesreen K. Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In AAAI. http:\/\/networkrepository.com"},{"key":"e_1_3_3_25_2","volume-title":"FPGA","author":"Shao Zhiyuan","year":"2019","unstructured":"Zhiyuan Shao, Ruoshi Li, Diqing Hu, Xiaofei Liao, and Hai Jin. 2019. Improving performance of graph processing on FPGA-DRAM platform by two-level vertex caching. In FPGA, 320\u2013329."},{"key":"e_1_3_3_26_2","first-page":"307","volume-title":"FPL","author":"Sierra Roberto","year":"2019","unstructured":"Roberto Sierra, Filippo Mangani, Carlos Carreras, and Gabriel Caffarena. 2019. High-performance decoding of variable-length memory data packets for FPGA stream processing. In FPL. 307\u2013313."},{"key":"e_1_3_3_27_2","first-page":"8:1\u20138:12","volume-title":"PACT","author":"Yao Pengcheng","year":"2018","unstructured":"Pengcheng Yao, Long Zheng, Xiaofei Liao, Hai Jin, and Bingsheng He. 2018. An efficient graph accelerator with parallel data conflict management. In PACT. 8:1\u20138:12."},{"key":"e_1_3_3_28_2","first-page":"229","volume-title":"FPGA","author":"Zhang Jialiang","year":"2018","unstructured":"Jialiang Zhang and Jing Li. 2018. Degree-aware hybrid graph traversal on FPGA-HMC platform. In FPGA. 229\u2013238."},{"key":"e_1_3_3_29_2","first-page":"1","volume-title":"ReConfig","author":"Zhou Shijie","year":"2015","unstructured":"Shijie Zhou, Charalampos Chelmis, and Viktor K. Prasanna. 2015. Optimizing memory performance for FPGA implementation of PageRank. In ReConfig. 1\u20136."},{"key":"e_1_3_3_30_2","doi-asserted-by":"crossref","unstructured":"Shijie Zhou Rajgopal Kannan Viktor K. Prasanna Guna Seetharaman and Qing Wu. 2019. HitGraph: High-throughput graph processing framework on FPGA. IEEE Transactions on Parallel and Distributed Systems 30 10 (2019) 2249\u20132264.","DOI":"10.1109\/TPDS.2019.2910068"},{"key":"e_1_3_3_31_2","first-page":"375","volume-title":"USENIX ATC","author":"Zhu Xiaowei","year":"2015","unstructured":"Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In USENIX ATC. 375\u2013386."},{"key":"e_1_3_3_32_2","first-page":"59","volume-title":"ICDE","author":"Zukowski Marcin","year":"2006","unstructured":"Marcin Zukowski, S\u00e1ndor H\u00e9man, Niels Nes, and Peter A. Boncz. 2006. Super-scalar RAM-CPU cache compression. In ICDE. 59."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3616497","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3616497","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:45:32Z","timestamp":1750178732000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3616497"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,13]]},"references-count":31,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3616497"],"URL":"https:\/\/doi.org\/10.1145\/3616497","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,13]]},"assertion":[{"value":"2023-01-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-11","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}