{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T12:48:16Z","timestamp":1751460496499,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,20]],"date-time":"2025-03-20T00:00:00Z","timestamp":1742428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2023YFB4502300"],"award-info":[{"award-number":["2023YFB4502300"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62322205, 62072195, and 62372199"],"award-info":[{"award-number":["62322205, 62072195, and 62372199"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>Graph processing is pivotal in deriving insights from complex data structures but faces performance limitations due to the irregular nature of graphs. Traditional general-purpose processors often struggle with low instruction-level parallelism and energy inefficiency when handling graph data. In response, modern graph accelerators have embraced an intra-edge-parallel model to enhance parallelization, significantly outperforming conventional processors. However, the indiscriminate processing of edges in existing systems results in substantial computational redundancy, negatively impacting overall efficiency.<\/jats:p>\n          <jats:p>This article introduces PRAGA, an innovative graph accelerator designed to optimize efficiency by selectively processing edges that significantly contribute to final results while preserving high computational parallelism. PRAGA utilizes an intra-edge-sequential model, prioritizing edge processing to capitalize on coarse-grained vertex-level parallelism and minimize unnecessary computations. It incorporates a hot-value manager to alleviate network-on-chip congestion and a memory-aware coalescer to minimize redundant data accesses. Our experimental results, obtained using a Xilinx Alveo U280 FPGA accelerator card, demonstrate that PRAGA achieves speedups of 17.88\u00d7 and 5.86\u00d7 over state-of-the-art accelerators ScalaGraph and GraphDyns, respectively, and outperforms the advanced GPU-based system Gunrock by 22.52\u00d7 on average. This substantial improvement underscores PRAGA\u2019s potential to redefine performance benchmarks in graph processing.<\/jats:p>","DOI":"10.1145\/3701998","type":"journal-article","created":{"date-parts":[[2024,11,26]],"date-time":"2024-11-26T10:04:05Z","timestamp":1732615445000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["PRAGA: A Priority-Aware Hardware\/Software Co-design for High-Throughput Graph Processing Acceleration"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-7847-1909","authenticated-orcid":false,"given":"Long","family":"Zheng","sequence":"first","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9388-6127","authenticated-orcid":false,"given":"Bing","family":"Zhu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4701-2239","authenticated-orcid":false,"given":"Pengcheng","family":"Yao","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-6416-5807","authenticated-orcid":false,"given":"Yuhang","family":"Zhou","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3002-9361","authenticated-orcid":false,"given":"Chengao","family":"Pan","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-9982-9408","authenticated-orcid":false,"given":"Wenju","family":"Zhao","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6302-813X","authenticated-orcid":false,"given":"Xiaofei","family":"Liao","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3934-7605","authenticated-orcid":false,"given":"Hai","family":"Jin","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology","place":["Wuhan, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0380-3506","authenticated-orcid":false,"given":"Jingling","family":"Xue","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engieering, UNSW Sydney","place":["Kensington, Australia"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,20]]},"reference":[{"issue":"12","key":"e_1_3_1_2_2","first-page":"122310:1\u2013122310","article-title":"A secure routing scheme based on social network analysis in wireless mesh networks","volume":"59","author":"Yao Yu","year":"2016","unstructured":"Yu Yao, Zhaolong Ning, and Guo Lei. 2016. A secure routing scheme based on social network analysis in wireless mesh networks. Science China Information Sciences 59, 12 (2016), 122310:1\u2013122310:12.","journal-title":"Science China Information Sciences"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772751"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2022.3181300"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1038\/nrn2575"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173197"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00051"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783759"},{"key":"e_1_3_1_9_2","first-page":"17","volume-title":"Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17\u201330."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442530"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.50"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00053"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00039"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477603"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00043"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-018-7167-0"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00092"},{"key":"e_1_3_1_19_2","first-page":"599","volume-title":"Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation","author":"Gonzalez Joseph E.","year":"2014","unstructured":"Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 599\u2013613."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-019-9020-5"},{"key":"e_1_3_1_21_2","unstructured":"NVIDIA. 2016. nvGRAPH. Retrieved from https:\/\/developer.nvidia.com\/nvgraph."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.24"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00053"},{"key":"e_1_3_1_24_2","first-page":"373","volume-title":"Proceedings of the 22nd USENIX Conference on File and Storage Technologies","author":"Yang Tsunyu","year":"2024","unstructured":"Tsunyu Yang, Yizou Chen, Yuhong Liang, and Mingchang Yang. 2024. Seraph: Towards scalable and efficient fully-external graph computation via on-demand processing. In Proceedings of the 22nd USENIX Conference on File and Storage Technologies. 373\u2013387."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.54"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00023"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358318"},{"key":"e_1_3_1_28_2","unstructured":"Intel Corporation. 2020. Intel Xeon Gold 6330 Processor. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/products\/sku\/212458\/intel-xeon-gold-6330-processor-42m-cache-2-00-ghz\/specifications.html"},{"key":"e_1_3_1_29_2","unstructured":"NVIDIA Corporation. 2020. NVIDIA A100 Tensor Core GPU Architecture. Retrieved from https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243201"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3276491"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00062"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654101"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1975.224157"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080253"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356151"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00010"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358254"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.2973991"},{"key":"e_1_3_1_41_2","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http:\/\/snap.stanford.edu\/data"},{"issue":"2","key":"e_1_3_1_42_2","first-page":"985","article-title":"Kronecker graphs: An approach to modeling networks.","volume":"11","author":"Leskovec Jure","year":"2010","unstructured":"Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11, 2 (2010), 985\u20131042.","journal-title":"Journal of Machine Learning Research"},{"issue":"22","key":"e_1_3_1_43_2","first-page":"45","article-title":"Introducing the graph 500","volume":"19","author":"Murphy Richard C.","year":"2010","unstructured":"Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the graph 500. Cray Users Group 19, 22 (2010), 45\u201374.","journal-title":"Cray Users Group"},{"key":"e_1_3_1_44_2","unstructured":"NVIDIA. 2021. NVIDIA System Management Interface. Retrieved from https:\/\/developer.nvidia.com\/system-management-interface"},{"key":"e_1_3_1_45_2","unstructured":"Xilinx. 2019. Xilinx Board Utility. Retrieved from https:\/\/www.xilinx.com\/htmldocs\/xilinx20191\/sdacceldoc\/xilinx-board-swiss-armyknife-utility-ufa1504034339078.html"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387537"},{"key":"e_1_3_1_47_2","first-page":"195","volume-title":"Proceedings of the 2017 USENIX Annual Technical Conference","author":"Ma Lingxiao","year":"2017","unstructured":"Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the 2017 USENIX Annual Technical Conference. 195\u2013207."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.111"},{"key":"e_1_3_1_49_2","first-page":"573","volume-title":"Proceedings of the 2020 USENIX Annual Technical Conference","author":"Zheng Long","year":"2020","unstructured":"Long Zheng, Xianliang Li, Yaohui Zheng, Yu Huang, Xiaofei Liao, Hai Jin, Jingling Xue, Zhiyuan Shao, and Qiangsheng Hua. 2020. Scaph: Scalable GPU-accelerated graph processing with value-driven differential scheduling. In Proceedings of the 2020 USENIX Annual Technical Conference. 573\u2013588."},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021739"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00077"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00028"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL57034.2022.00016"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD51958.2021.9643582"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394885.3431548"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00078"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41404.2022.00050"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701998","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3701998","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:16Z","timestamp":1750298236000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701998"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,20]]},"references-count":56,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3701998"],"URL":"https:\/\/doi.org\/10.1145\/3701998","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,3,20]]},"assertion":[{"value":"2024-03-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-26","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}