{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:35:21Z","timestamp":1772724921376,"version":"3.50.1"},"reference-count":82,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T00:00:00Z","timestamp":1590710400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2018YFB1003502"],"award-info":[{"award-number":["2018YFB1003502"]}]},{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"crossref","award":["DP180104069"],"award-info":[{"award-number":["DP180104069"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61825202, 61832006, and 61702201"],"award-info":[{"award-number":["61825202, 61832006, and 61702201"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,6,30]]},"abstract":"<jats:p>\n            FPGA-based graph processing accelerators are nowadays equipped with multiple pipelines for hardware acceleration of graph computations. However, their multi-pipeline efficiency can suffer greatly from the considerable overheads caused by the read\/write conflicts in their on-chip BRAM from different pipelines, leading to significant performance degradation and poor scalability. In this article, we investigate the underlying causes behind such inter-pipeline read\/write conflicts by focusing on multi-pipeline FPGAs for accelerating\n            <jats:italic>Sparse Matrix Vector Multiplication<\/jats:italic>\n            (SpMV) arising in graph processing. We exploit our key insight that the problem of eliminating inter-pipeline read\/write conflicts for SpMV can be formulated as one of solving a row- and column-wise tiling problem for its associated adjacency matrix. However, how to partition a sparse adjacency matrix obtained from any graph with respect to a set of pipelines by both eliminating all the inter-pipeline read\/write conflicts and keeping all the pipelines reasonably load-balanced is challenging.\n          <\/jats:p>\n          <jats:p>We present a conflict-free scheduler, WaveScheduler, that can dispatch different sub-matrix tiles to different pipelines without any read\/write conflict. We also introduce two optimizations that are specifically tailored for graph processing, \u201cdegree-aware vertex index renaming\u201d for improving load balancing and \u201cdata re-organization\u201d for enabling sequential off-chip memory access, for all the pipelines. Our evaluation on Xilinx\u00aeAlveo\u2122 U250 accelerator card with 16 pipelines shows that WaveScheduler can achieve up to 3.57 GTEPS, running much faster than native scheduling and two state-of-the-art FPGA-based graph accelerators (by 6.48\u00d7 for \u201cnative,\u201d 2.54\u00d7 for HEGP, and 2.11\u00d7 for ForeGraph), on average. In particular, these performance gains also scale up significantly as the number of pipelines increases.<\/jats:p>","DOI":"10.1145\/3390523","type":"journal-article","created":{"date-parts":[[2020,5,30]],"date-time":"2020-05-30T04:22:06Z","timestamp":1590812526000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAs"],"prefix":"10.1145","volume":"17","author":[{"given":"Qinggang","family":"Wang","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"given":"Long","family":"Zheng","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"given":"Jieshan","family":"Zhao","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"given":"Xiaofei","family":"Liao","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"given":"Hai","family":"Jin","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"given":"Jingling","family":"Xue","sequence":"additional","affiliation":[{"name":"University of New South Wales, Sydney, Australia"}]}],"member":"320","published-online":{"date-parts":[[2020,5,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750386"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389013"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3226228"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(02)00218-1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.43"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498258"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557049"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-S6-S6"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847339"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021739"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192404"},{"key":"e_1_2_1_13_1","volume-title":"Bader","author":"Madduri Kamesh","year":"2013","unstructured":"Kamesh Madduri and David A . Bader . 2013 . GTgraph: A Synthetic Graph Generator Suite. Retrieved from http:\/\/www.cse.psu.edu\/ kxm85\/software\/GTgraph\/. Kamesh Madduri and David A. Bader. 2013. GTgraph: A Synthetic Graph Generator Suite. Retrieved from http:\/\/www.cse.psu.edu\/ kxm85\/software\/GTgraph\/."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301670"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3301490"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2014.23"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/1083592.1083676"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912)","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez , Yucheng Low , Haijie Gu , Danny Bickson , and Carlos Guestrin . 2012 . PowerGraph: Distributed graph-parallel computation on natural graphs . In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912) . USENIX Association, 17--30. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912). USENIX Association, 17--30."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.780876"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-019-1914-z"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783759"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2006.88"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487581"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470459"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(98)00093-3"},{"key":"e_1_2_1_26_1","unstructured":"Intel. 2018. Introduction to Intel FPGA Design Flow for Xilinx Users. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/programmable\/documentation\/mtr1422491996806.html#rrk1514930092144.  Intel. 2018. Introduction to Intel FPGA Design Flow for Xilinx Users. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/programmable\/documentation\/mtr1422491996806.html#rrk1514930092144."},{"key":"e_1_2_1_27_1","volume-title":"Towards dataflow based graph processing. Sci. China Inform. Sci. 60, 12","author":"Jin Hai","year":"2017","unstructured":"Hai Jin , Pengcheng Yao , and Xiaofei Liao . 2017. Towards dataflow based graph processing. Sci. China Inform. Sci. 60, 12 ( 2017 ), 126102:1--126102:3. Hai Jin, Pengcheng Yao, and Xiaofei Liao. 2017. Towards dataflow based graph processing. Sci. China Inform. Sci. 60, 12 (2017), 126102:1--126102:3."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2017.150"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.106"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC\u201916)","author":"Kepner Jeremy","unstructured":"Jeremy Kepner , Peter Aaltonen , David A. Bader , Aydin Bulu\u00e7 , Franz Franchetti , John R. Gilbert , Dylan Hutchison , Manoj Kumar , Andrew Lumsdaine , Henning Meyerhenke , Scott McMillan , Carl Yang , John D. Owens , Marcin Zalewski , Timothy G. Mattson , and Jos\u00e9 E. Moreira . 2016. Mathematical foundations of the GraphBLAS . In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC\u201916) . IEEE Computer Society, 1--9. Jeremy Kepner, Peter Aaltonen, David A. Bader, Aydin Bulu\u00e7, Franz Franchetti, John R. Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski, Timothy G. Mattson, and Jos\u00e9 E. Moreira. 2016. Mathematical foundations of the GraphBLAS. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC\u201916). IEEE Computer Society, 1--9."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 20th International Symposium on Field-programmable Custom Computing Machines (FCCM\u201912)","author":"Kestur Srinidhi","unstructured":"Srinidhi Kestur , John D. Davis , and Eric S. Chung . 2012. Towards a universal FPGA matrix-vector multiplication architecture . In Proceedings of the 20th International Symposium on Field-programmable Custom Computing Machines (FCCM\u201912) . IEEE Computer Society, 9--16. Srinidhi Kestur, John D. Davis, and Eric S. Chung. 2012. Towards a universal FPGA matrix-vector multiplication architecture. In Proceedings of the 20th International Symposium on Field-programmable Custom Computing Machines (FCCM\u201912). IEEE Computer Society, 9--16."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465369"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/99163.99183"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1880037.1880041"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150476"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.2000.892065"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912)","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola , Guy Blelloch , and Carlos Guestrin . 2012 . GraphChi: Large-scale graph computation on just a PC . In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912) . USENIX Association, 31--46. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912). USENIX Association, 31--46."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919)","author":"Lakhotia Kartik","unstructured":"Kartik Lakhotia , Rajgopal Kannan , Sourav Pati , and Viktor K. Prasanna . 2019. GPOP: A cache and memory-efficient framework for graph processing over partitions . In Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919) . ACM, 393--394. Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, and Viktor K. Prasanna. 2019. GPOP: A cache and memory-efficient framework for graph processing over partitions. In Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919). ACM, 393--394."},{"key":"e_1_2_1_39_1","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from: http:\/\/snap.stanford.edu\/data.  Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from: http:\/\/snap.stanford.edu\/data."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3205289.3205313"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-018-7167-0"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.914756"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.1999.807526"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.54"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522739"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.24"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/169627.169752"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/WAINA.2010.85"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02577866"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2815400.2815408"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_2_1_54_1","volume-title":"SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Technical Report NASA-CR-185876","author":"Saad Youcef","year":"1990","unstructured":"Youcef Saad . 1990 . SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Technical Report NASA-CR-185876 . University of Illinois. Youcef Saad. 1990. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Technical Report NASA-CR-185876. University of Illinois."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.88484"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610518"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807655"},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201913)","author":"Shun Julian","unstructured":"Julian Shun and Guy E. Blelloch . 2013. Ligra: A lightweight graph processing framework for shared memory . In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201913) . ACM, 135--146. Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201913). ACM, 135--146."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00052"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3282495.3282501"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45545-0_23"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041294"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2009.5377624"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.14778\/2809974.2809983"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2014.6974716"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2016.02.015"},{"key":"e_1_2_1_67_1","first-page":"1","article-title":"Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201916)","volume":"11","author":"Wang Yangzihao","year":"2016","unstructured":"Yangzihao Wang , Andrew Davidson , Yuechao Pan , Yuduo Wu , Andy Riffel , and John D. Owens . 2016 . Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201916) . ACM , 11 : 1 -- 11 :12. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201916). ACM, 11:1--11:12.","journal-title":"ACM"},{"key":"e_1_2_1_68_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918)","author":"Xie Biwei","year":"2018","unstructured":"Biwei Xie , Jianfeng Zhan , Xu Liu , Wanling Gao , Zhen Jia , Xiwen He , and Lixin Zhang . 2018 . CVR: Efficient vectorization of SpMV on x86 processors . In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918) . ACM, 149--162. Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: Efficient vectorization of SpMV on x86 processors. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918). ACM, 149--162."},{"key":"e_1_2_1_69_1","unstructured":"Xilinx. 2005. Xilinx Application Note. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/application_notes\/xapp463.pdsf.  Xilinx. 2005. Xilinx Application Note. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/application_notes\/xapp463.pdsf."},{"key":"e_1_2_1_70_1","unstructured":"Xilinx. 2019. SDAccel Environment User Guide. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx2019_1\/ug1023-sdaccel-user-guide.pdf.  Xilinx. 2019. SDAccel Environment User Guide. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx2019_1\/ug1023-sdaccel-user-guide.pdf."},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-019-9020-5"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243201"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-016-5595-8"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037708"},{"key":"e_1_2_1_75_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201918)","author":"Zhang Yu","year":"2018","unstructured":"Yu Zhang , Xiaofei Liao , Hai Jin , Lin Gu , Ligang He , Bingsheng He , and Haikun Liu . 2018 . CGraph: A correlations-aware approach for efficient concurrent iterative graph processing . In Proceedings of the USENIX Annual Technical Conference (ATC\u201918) . USENIX Association, 441--452. Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Ligang He, Bingsheng He, and Haikun Liu. 2018. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proceedings of the USENIX Annual Technical Conference (ATC\u201918). USENIX Association, 441--452."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3170434"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243206"},{"key":"e_1_2_1_78_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918)","author":"Zheng Long","year":"2018","unstructured":"Long Zheng , Xiaofei Liao , Hai Jin , Jieshan Zhao , and Qinggang Wang . 2018 . Scalable concurrency debugging with distributed graph processing . In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918) . ACM, 188--199. Long Zheng, Xiaofei Liao, Hai Jin, Jieshan Zhao, and Qinggang Wang. 2018. Scalable concurrency debugging with distributed graph processing. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918). ACM, 188--199."},{"key":"e_1_2_1_79_1","volume-title":"Proceedings of the 24th IEEE International Symposium on Field-programmable Custom Computing Machines (FCCM\u201916)","author":"Zhou Shijie","unstructured":"Shijie Zhou , Charalampos Chelmis , and Viktor K. Prasanna . 2016. High-throughput and energy-efficient graph processing on FPGA . In Proceedings of the 24th IEEE International Symposium on Field-programmable Custom Computing Machines (FCCM\u201916) . IEEE Computer Society, Washington, DC, 103--110. Shijie Zhou, Charalampos Chelmis, and Viktor K. Prasanna. 2016. High-throughput and energy-efficient graph processing on FPGA. In Proceedings of the 24th IEEE International Symposium on Field-programmable Custom Computing Machines (FCCM\u201916). IEEE Computer Society, Washington, DC, 103--110."},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2910068"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021734"},{"key":"e_1_2_1_82_1","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Zhu Xiaowei","year":"2016","unstructured":"Xiaowei Zhu , Wenguang Chen , Weimin Zheng , and Xiaosong Ma . 2016 . Gemini: A computation-centric distributed graph processing system . In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916) . USENIX Association, 301--316. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916). USENIX Association, 301--316."},{"key":"e_1_2_1_83_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201915)","author":"Zhu Xiaowei","year":"2015","unstructured":"Xiaowei Zhu , Wentao Han , and Wenguang Chen . 2015 . GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning . In Proceedings of the USENIX Annual Technical Conference (ATC\u201915) . USENIX Association, 375--386. Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the USENIX Annual Technical Conference (ATC\u201915). USENIX Association, 375--386."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3390523","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3390523","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:32Z","timestamp":1750200092000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3390523"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,29]]},"references-count":82,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,6,30]]}},"alternative-id":["10.1145\/3390523"],"URL":"https:\/\/doi.org\/10.1145\/3390523","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,29]]},"assertion":[{"value":"2019-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}