{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T21:30:47Z","timestamp":1774474247355,"version":"3.50.1"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T00:00:00Z","timestamp":1674518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004412","name":"Turkish Academy of Sciences","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004412","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Technological Research Council of Turkey","award":["119E559"],"award-info":[{"award-number":["119E559"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>Hardware systems composed of diverse execution resources are being deployed to cope with the complexity and performance requirements of Artificial Intelligence (AI) and Machine Learning (ML) applications. With the emergence of new hardware platforms, system-wide programming support has become much more important. While this is true for various devices ranging from CPUs to GPUs, it is especially critical for specific neural network accelerators implemented on FPGAs. For example, Intel\u2019s recent HARP platform encompasses a Xeon CPU and an FPGA, which requires an intense software stack to be used effectively. Programming such a hybrid system will be a challenge for most of the non-expert users. High-level language solutions such as Intel OpenCL for FPGA try to address the problem. However, as the abstraction level increases, the efficiency of implementation decreases, depicting two opposing requirements. In this work, we propose a framework to generate HLS-based, FPGA-accelerated, high-throughput\/work-efficient, synthesizable, and template-based graph-processing pipeline. While a fixed and clock-wise precisely designed deep-pipeline architecture, written in SystemC, is responsible for processing graph vertices, the user implements the intended iterative graph algorithm by implementing\/modifying only a single module in C\/C++. This way, efficiency and high performance can be achieved with better programmability and productivity. With similar programming efforts, it is shown that the proposed template outperforms a high-throughput OpenCL baseline by up to 50% in terms of edge throughput. Furthermore, the novel work-efficient design significantly improves execution time and power consumption by up to 100\u00d7.<\/jats:p>","DOI":"10.1145\/3529256","type":"journal-article","created":{"date-parts":[[2022,4,20]],"date-time":"2022-04-20T12:03:23Z","timestamp":1650456203000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["HLS-based High-throughput and Work-efficient Synthesizable Graph Processing Template Pipeline"],"prefix":"10.1145","volume":"22","author":[{"given":"Hamzeh","family":"Ahangari","sequence":"first","affiliation":[{"name":"Bilkent University, Ankara, Turkey"}]},{"given":"Muhammet Mustafa","family":"\u00d6zdal","sequence":"additional","affiliation":[{"name":"Bilkent University, Ankara, Turkey"}]},{"given":"\u00d6zcan","family":"\u00d6zt\u00fcrk","sequence":"additional","affiliation":[{"name":"Bilkent University, Ankara, Turkey"}]}],"member":"320","published-online":{"date-parts":[[2023,1,24]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbl022"},{"key":"e_1_3_1_3_2","unstructured":"Amazon Corporation. 2019. Enable faster FPGA accelerator development and deployment in the cloud. Retrieved from https:\/\/aws.amazon.com\/ec2\/instance-types\/f1."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2706562"},{"key":"e_1_3_1_5_2","volume-title":"IEEE International Symposium on High Performance Computer Architecture (HPCA)","author":"Basak Abanti","year":"2019","unstructured":"Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and optimization of the memory hierarchy for graph processing workloads. In IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_1_6_2","first-page":"820","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Beamer Scott","year":"2017","unstructured":"Scott Beamer, Krste Asanovi\u0107, and David Patterson. 2017. Reducing PageRank communication via propagation blocking. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 820\u2013831."},{"key":"e_1_3_1_7_2","first-page":"308","volume-title":"24th Annual ACM symposium on Parallelism in Algorithms and Architectures","author":"Blelloch Guy E.","year":"2012","unstructured":"Guy E. Blelloch, Jeremy T. Fineman, and Julian Shun. 2012. Greedy sequential maximal independent set and matching are parallel on average. In 24th Annual ACM symposium on Parallelism in Algorithms and Architectures. 308\u2013317."},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1109\/IISWC.2012.6402918","volume-title":"IEEE International Symposium on Workload Characterization (IISWC)","author":"Burtscher Martin","year":"2012","unstructured":"Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In IEEE International Symposium on Workload Characterization (IISWC). 141\u2013151."},{"key":"e_1_3_1_9_2","volume-title":"49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Caulfield Adrian M.","year":"2016","unstructured":"Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et\u00a0al. 2016. A cloud-scale acceleration architecture. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_1_10_2","article-title":"GaaS-X: Graph analytics accelerator supporting sparse data representation using crossbar architectures","author":"Challapalle Nagadastagiri","year":"2020","unstructured":"Nagadastagiri Challapalle, Sahithi Rampalli, Linghao Song, Nandhini Chandramoorthy, Karthik Swaminathan, John Sampson, Yiran Chen, and Vijaykrishnan Narayanan. 2020. GaaS-X: Graph analytics accelerator supporting sparse data representation using crossbar architectures. In ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).","journal-title":"ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.022071131"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1109\/FCCM.2018.00023","volume-title":"IEEE 26th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM)","author":"Cong Jason","year":"2018","unstructured":"Jason Cong, Zhenman Fang, Michael Lo, Hanrui Wang, Jingxian Xu, and Shaochong Zhang. 2018. Understanding performance differences of FPGAs and GPUs. In IEEE 26th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM). 93\u201396."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021739"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3200691.3178506"},{"key":"e_1_3_1_15_2","volume-title":"49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Ham Tae Jun","year":"2016","unstructured":"Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_1_16_2","unstructured":"Intel Corporation. 2019. Rapid design methods for developing hardware accelerators. Retrieved from https:\/\/github.com\/intel\/rapid-design-methods-for-developing-hardware-accelerators."},{"key":"e_1_3_1_17_2","unstructured":"Intel Corporation. 2020. Intel FPGA SDK for OpenCL Pro Edition: Best Practices Guide. Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/opencl-sdk\/aocl-best-practices-guide.pdf."},{"key":"e_1_3_1_18_2","unstructured":"Intel Corporation. 2020. Intel FPGA SDK for OpenCL Pro Edition: Programming Guide. Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/opencl-sdk\/aocl_programming_guide.pdf."},{"key":"e_1_3_1_19_2","article-title":"Transformations of high-level synthesis codes for high-performance computing","volume":"1805","author":"Licht Johannes de Fine","year":"2018","unstructured":"Johannes de Fine Licht, Simon Meierhans, and Torsten Hoefler. 2018. Transformations of high-level synthesis codes for high-performance computing. Computing Research Repository (CoRR) abs\/1805.08288 (2018).","journal-title":"Computing Research Repository (CoRR)"},{"key":"e_1_3_1_20_2","volume-title":"51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Mukkara Anurag","year":"2018","unstructured":"Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_1_21_2","first-page":"5","volume-title":"ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Nurvitadhi Eriko","year":"2017","unstructured":"Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, and Guy Boudoukh. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 5\u201314."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/2840819.2840913"},{"key":"e_1_3_1_23_2","first-page":"166","volume-title":"ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","author":"Ozdal Muhammet Mustafa","year":"2016","unstructured":"Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven Burns, and Ozcan Ozturk. 2016. Energy efficient architecture for graph analytics accelerators. In ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 166\u2013177."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2014.6818295"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0190(85)90024-9"},{"key":"e_1_3_1_26_2","volume-title":"Design, Automation and Test in Europe (DATE)","author":"Siret Nicolas","year":"2012","unstructured":"Nicolas Siret, Matthieu Wipliez, Jean Fran\u00e7ois Nezan, and Francesca Palumbo. 2012. Generation of efficient high-level hardware code from dataflow programs. In Design, Automation and Test in Europe (DATE)."},{"key":"e_1_3_1_27_2","unstructured":"Texas A&M University. 2020. The SuiteSparse Matrix Collection. Retrieved from https:\/\/sparse.tamu.edu."},{"key":"e_1_3_1_28_2","unstructured":"Twitter Corporation. 2020. Twitter Q3 2020 Earnings Report. Retrieved from https:\/\/investor.twitterinc.com\/financial-information\/quarterly-results."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3390523"},{"key":"e_1_3_1_30_2","volume-title":"52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Yan Mingyu","year":"2019","unstructured":"Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, et\u00a0al. 2019. Alleviating irregularity in graph analytics acceleration: A hardware\/software co-design approach. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243201"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/2840819.2840926"},{"key":"e_1_3_1_33_2","first-page":"229","volume-title":"ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Zhang Jialiang","year":"2018","unstructured":"Jialiang Zhang and Jing Li. 2018. Degree-aware hybrid graph traversal on FPGA-HMC platform. In ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 229\u2013238."},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/FCCM.2016.35","volume-title":"IEEE 24th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM)","author":"Zhou Shijie","year":"2016","unstructured":"Shijie Zhou, Charalampos Chelmis, and Viktor K. Prasanna. 2016. High-throughput and energy-efficient graph processing on FPGA. In IEEE 24th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM). 103\u2013110."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2910068"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.34"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529256","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3529256","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:20Z","timestamp":1750186940000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529256"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,24]]},"references-count":35,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3529256"],"URL":"https:\/\/doi.org\/10.1145\/3529256","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,24]]},"assertion":[{"value":"2021-10-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-26","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}