{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:33:41Z","timestamp":1772724821647,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":87,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,11]],"date-time":"2022-06-11T00:00:00Z","timestamp":1654905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,18]]},"DOI":"10.1145\/3470496.3527439","type":"proceedings-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T19:06:01Z","timestamp":1654023961000},"page":"946-961","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network"],"prefix":"10.1145","author":[{"given":"Shuangchen","family":"Li","sequence":"first","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dimin","family":"Niu","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuhao","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Han","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhe","family":"Zhang","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianchan","family":"Guan","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yijin","family":"Guan","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heng","family":"Liu","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linyong","family":"Huang","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhaoyang","family":"Du","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fei","family":"Xue","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanwei","family":"Fang","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongzhong","family":"Zheng","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuan","family":"Xie","sequence":"additional","affiliation":[{"name":"Alibaba Group"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,6,11]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Computing graph neural networks: A survey from algorithms to accelerators. arXiv preprint arXiv:2010.00130","author":"Abadal Sergi","year":"2020","unstructured":"Sergi Abadal , Akshay Jain , Robert Guirado , Jorge L\u00f3pez-Alonso , and Eduard Alarc\u00f3n . 2020. Computing graph neural networks: A survey from algorithms to accelerators. arXiv preprint arXiv:2010.00130 ( 2020 ). Sergi Abadal, Akshay Jain, Robert Guirado, Jorge L\u00f3pez-Alonso, and Eduard Alarc\u00f3n. 2020. Computing graph neural networks: A survey from algorithms to accelerators. arXiv preprint arXiv:2010.00130 (2020)."},{"key":"e_1_3_2_1_2_1","unstructured":"AliCloud F3 2022. Alibaba Cloud Elastic Compute Service Compute optimized type family with FPGA. https:\/\/www.alibabacloud.com\/help\/doc-detail\/108504.html  AliCloud F3 2022. Alibaba Cloud Elastic Compute Service Compute optimized type family with FPGA. https:\/\/www.alibabacloud.com\/help\/doc-detail\/108504.html"},{"key":"e_1_3_2_1_3_1","unstructured":"AliCloud MoC 2022. Alibaba Cloud X-Dragon NiC. https:\/\/www.alibabacloud.com\/blog\/introducing-the-sixth-generation-of-alibaba-clouds-elastic-compute-service_595716  AliCloud MoC 2022. Alibaba Cloud X-Dragon NiC. https:\/\/www.alibabacloud.com\/blog\/introducing-the-sixth-generation-of-alibaba-clouds-elastic-compute-service_595716"},{"key":"e_1_3_2_1_4_1","volume-title":"PAI","year":"2022","unstructured":"AliCloud PAI 2022 . Alibaba Cloud's Machine Learning Platform for AI (PAI). https:\/\/www.alibabacloud.com\/product\/machine-learning AliCloud PAI 2022. Alibaba Cloud's Machine Learning Platform for AI (PAI). https:\/\/www.alibabacloud.com\/product\/machine-learning"},{"key":"e_1_3_2_1_5_1","unstructured":"AliCloud Price 2022. Alibaba Cloud Elastic Compute Service Price Calculator. https:\/\/www.alibabacloud.com\/pricing-calculator  AliCloud Price 2022. Alibaba Cloud Elastic Compute Service Price Calculator. https:\/\/www.alibabacloud.com\/pricing-calculator"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218751"},{"key":"e_1_3_2_1_7_1","unstructured":"AWS EC2 2022. Amazon EC2 F1 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/  AWS EC2 2022. Amazon EC2 F1 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/"},{"key":"e_1_3_2_1_8_1","unstructured":"AWS Nitro 2022. AWS Nitro System. https:\/\/aws.amazon.com\/ec2\/nitro\/  AWS Nitro 2022. AWS Nitro System. https:\/\/aws.amazon.com\/ec2\/nitro\/"},{"key":"e_1_3_2_1_9_1","volume-title":"FPGA","author":"Azure","year":"2022","unstructured":"Azure FPGA 2022 . Microsoft Azure FPGA Inference. https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/how-to-deploy-fpga-web-service Azure FPGA 2022. Microsoft Azure FPGA Inference. https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/how-to-deploy-fpga-web-service"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00051"},{"key":"e_1_3_2_1_11_1","unstructured":"Bluefield 2022. BlueField-3 the most powerful software-defined hardware-accelerated data center infrastructure on a chip. https:\/\/www.nvidia.com\/en-us\/networking\/products\/data-processing-unit\/  Bluefield 2022. BlueField-3 the most powerful software-defined hardware-accelerated data center infrastructure on a chip. https:\/\/www.nvidia.com\/en-us\/networking\/products\/data-processing-unit\/"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783710"},{"key":"e_1_3_2_1_13_1","volume-title":"FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 581--594","author":"Chen Xuhao","year":"2021","unstructured":"Xuhao Chen , Tianhao Huang , Shuotao Xu , Thomas Bourgeat , Chanwoo Chung , and Arvind Arvind . 2021 . FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 581--594 . Xuhao Chen, Tianhao Huang, Shuotao Xu, Thomas Bourgeat, Chanwoo Chung, and Arvind Arvind. 2021. FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 581--594."},{"key":"e_1_3_2_1_14_1","volume-title":"Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training","author":"Chen Xiaobing","year":"2021","unstructured":"Xiaobing Chen , Yuke Wang , Xinfeng Xie , Xing Hu , Abanti Basak , Ling Liang , Mingyu Yan , Lei Deng , Yufei Ding , Zidong Du , 2021 . Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( 2021). Xiaobing Chen, Yuke Wang, Xinfeng Xie, Xing Hu, Abanti Basak, Ling Liang, Mingyu Yan, Lei Deng, Yufei Ding, Zidong Du, et al. 2021. Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415610"},{"key":"e_1_3_2_1_16_1","unstructured":"CXL 2022. Compute Express Link 2.0: The Breakthrough CPU-to-Device Interconnect. https:\/\/www.computeexpresslink.org\/  CXL 2022. Compute Express Link 2.0: The Breakthrough CPU-to-Device Interconnect. https:\/\/www.computeexpresslink.org\/"},{"key":"e_1_3_2_1_17_1","volume-title":"PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 595--608","author":"Dadu Vidushi","year":"2021","unstructured":"Vidushi Dadu , Sihao Liu , and Tony Nowatzki . 2021 . PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 595--608 . Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 595--608."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847339"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021739"},{"key":"e_1_3_2_1_20_1","unstructured":"Extended Version 2022. Hyperscale FPGA-As-A-Service Architecture for Large-Scale Distributed Graph Neural Network (extended version). https:\/\/shuangchenli.github.io\/isca22tr.pdf  Extended Version 2022. Hyperscale FPGA-As-A-Service Architecture for Large-Scale Distributed Graph Neural Network (extended version). https:\/\/shuangchenli.github.io\/isca22tr.pdf"},{"key":"e_1_3_2_1_21_1","unstructured":"FaaS Benefit 2022. Alibaba Cloud FPGA-based ECS Instances Scenarios. https:\/\/partners-intl.aliyun.com\/help\/doc-detail\/163848.htm  FaaS Benefit 2022. Alibaba Cloud FPGA-based ECS Instances Scenarios. https:\/\/partners-intl.aliyun.com\/help\/doc-detail\/163848.htm"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00012"},{"key":"e_1_3_2_1_23_1","unstructured":"FPGA Board 2022. Bittware XUP-VV8. https:\/\/www.bittware.com\/fpga\/xup-vv8\/  FPGA Board 2022. Bittware XUP-VV8. https:\/\/www.bittware.com\/fpga\/xup-vv8\/"},{"key":"e_1_3_2_1_24_1","volume-title":"Learning graph representations with embedding propagation. Advances in neural information processing systems 30","author":"Duran Alberto Garcia","year":"2017","unstructured":"Alberto Garcia Duran and Mathias Niepert . 2017. Learning graph representations with embedding propagation. Advances in neural information processing systems 30 ( 2017 ), 5119--5130. Alberto Garcia Duran and Mathias Niepert. 2017. Learning graph representations with embedding propagation. Advances in neural information processing systems 30 (2017), 5119--5130."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00079"},{"key":"e_1_3_2_1_26_1","unstructured":"GENZ 2022. Gen-Z: An open systems Interconnect designed to provide memory-semantic access to data and devices via direct-attached switched or fabric topologies. https:\/\/genzconsortium.org\/  GENZ 2022. Gen-Z: An open systems Interconnect designed to provide memory-semantic access to data and devices via direct-attached switched or fabric topologies. https:\/\/genzconsortium.org\/"},{"key":"e_1_3_2_1_27_1","unstructured":"GRACE 2022. NVIDIA GRACE CPU. https:\/\/www.nvidia.com\/en-us\/data-center\/grace-cpu\/  GRACE 2022. NVIDIA GRACE CPU. https:\/\/www.nvidia.com\/en-us\/data-center\/grace-cpu\/"},{"key":"e_1_3_2_1_28_1","unstructured":"GraphLrean 2022. Graph-Learn (a.k.a. AliGraph): An Industrial Graph Neural Network Framework. https:\/\/github.com\/alibaba\/graph-learn  GraphLrean 2022. Graph-Learn (a.k.a. AliGraph): An Industrial Graph Neural Network Framework. https:\/\/github.com\/alibaba\/graph-learn"},{"key":"e_1_3_2_1_29_1","unstructured":"H100 2022. NVIDIA Hopper Architecture In-Depth. https:\/\/developer.nvidia.com\/blog\/nvidia-hopper-architecture-in-depth\/  H100 2022. NVIDIA Hopper Architecture In-Depth. https:\/\/developer.nvidia.com\/blog\/nvidia-hopper-architecture-in-depth\/"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783759"},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035","author":"Hamilton William L","year":"2017","unstructured":"William L Hamilton , Rex Ying , and Jure Leskovec . 2017 . Inductive representation learning on large graphs . In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035 . William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358275"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00075"},{"key":"e_1_3_2_1_34_1","volume-title":"Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network. In 2022 IEEE Access","author":"Huang Linyong","unstructured":"Linyong Huang , Zhe Zhang , Shuangchen Li , Dimin Niu , Yijin Guan , Hongzhong Zheng , and Yuan Xie . 2022. Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network. In 2022 IEEE Access . IEEE. Linyong Huang, Zhe Zhang, Shuangchen Li, Dimin Niu, Yijin Guan, Hongzhong Zheng, and Yuan Xie. 2022. Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network. In 2022 IEEE Access. IEEE."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505665"},{"key":"e_1_3_2_1_36_1","volume-title":"SPR","author":"Intel","year":"2022","unstructured":"Intel SPR 2022 . Golden Cove - Microarchitectures - Intel. https:\/\/en.wikichip.org\/wiki\/intel\/microarchitectures\/golden_cove Intel SPR 2022. Golden Cove - Microarchitectures - Intel. https:\/\/en.wikichip.org\/wiki\/intel\/microarchitectures\/golden_cove"},{"key":"e_1_3_2_1_37_1","first-page":"187","article-title":"Improving the accuracy, scalability, and performance of graph neural networks with roc","volume":"2","author":"Jia Zhihao","year":"2020","unstructured":"Zhihao Jia , Sina Lin , Mingyu Gao , Matei Zaharia , and Alex Aiken . 2020 . Improving the accuracy, scalability, and performance of graph neural networks with roc . Proceedings of Machine Learning and Systems 2 (2020), 187 -- 198 . Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems 2 (2020), 187--198.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00010"},{"key":"e_1_3_2_1_39_1","volume-title":"Proceedings of the Workshop on Resource-Constrained Machine Learning (ReCoML","author":"Kiningham Kevin","year":"2020","unstructured":"Kevin Kiningham , Philip Levis , and Christopher R\u00e9 . 2020 . GReTA: Hardware Optimized Graph Processing for GNNs . In Proceedings of the Workshop on Resource-Constrained Machine Learning (ReCoML 2020). Kevin Kiningham, Philip Levis, and Christopher R\u00e9. 2020. GReTA: Hardware Optimized Graph Processing for GNNs. In Proceedings of the Workshop on Resource-Constrained Machine Learning (ReCoML 2020)."},{"key":"e_1_3_2_1_40_1","volume-title":"GRIP: a graph neural network accelerator architecture. arXiv preprint arXiv:2007.13828","author":"Kiningham Kevin","year":"2020","unstructured":"Kevin Kiningham , Christopher Re , and Philip Levis . 2020. GRIP: a graph neural network accelerator architecture. arXiv preprint arXiv:2007.13828 ( 2020 ). Kevin Kiningham, Christopher Re, and Philip Levis. 2020. GRIP: a graph neural network accelerator architecture. arXiv preprint arXiv:2007.13828 (2020)."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2016.11"},{"key":"e_1_3_2_1_42_1","volume-title":"GLIST: Towards In-Storage Graph Learning. In 2021 {USENIX} Annual Technical Conference ({USENIX} {ATC} 21). 225--238.","author":"Li Cangyuan","year":"2021","unstructured":"Cangyuan Li , Ying Wang , Cheng Liu , Shengwen Liang , Huawei Li , and Xiaowei Li . 2021 . GLIST: Towards In-Storage Graph Learning. In 2021 {USENIX} Annual Technical Conference ({USENIX} {ATC} 21). 225--238. Cangyuan Li, Ying Wang, Cheng Liu, Shengwen Liang, Huawei Li, and Xiaowei Li. 2021. GLIST: Towards In-Storage Graph Learning. In 2021 {USENIX} Annual Technical Conference ({USENIX} {ATC} 21). 225--238."},{"key":"e_1_3_2_1_43_1","volume-title":"Graph-Theta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy. arXiv preprint arXiv:2104.10569","author":"Li Houyi","year":"2021","unstructured":"Houyi Li , Yongchao Liu , Yongyong Li , Bin Huang , Peng Zhang , Guowei Zhang , Xintan Zeng , Kefeng Deng , Wenguang Chen , and Changhua He. 2021. Graph-Theta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy. arXiv preprint arXiv:2104.10569 ( 2021 ). Houyi Li, Yongchao Liu, Yongyong Li, Bin Huang, Peng Zhang, Guowei Zhang, Xintan Zeng, Kefeng Deng, Wenguang Chen, and Changhua He. 2021. Graph-Theta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy. arXiv preprint arXiv:2104.10569 (2021)."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00070"},{"key":"e_1_3_2_1_45_1","volume-title":"Engn: A high-throughput and energy-efficient accelerator for large graph neural networks","author":"Liang Shengwen","year":"2020","unstructured":"Shengwen Liang , Ying Wang , Cheng Liu , Lei He , LI Huawei , Dawen Xu , and Xiaowei Li . 2020 . Engn: A high-throughput and energy-efficient accelerator for large graph neural networks . IEEE Trans. Comput . (2020). Shengwen Liang, Ying Wang, Cheng Liu, Lei He, LI Huawei, Dawen Xu, and Xiaowei Li. 2020. Engn: A high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Trans. Comput. (2020)."},{"key":"e_1_3_2_1_46_1","volume-title":"Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications. In 2021 IEEE\/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9.","author":"Lin Jilan","year":"2021","unstructured":"Jilan Lin , Shuangchen Li , Yufei Ding , and Yuan Xie . 2021 . Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications. In 2021 IEEE\/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9. Jilan Lin, Shuangchen Li, Yufei Ding, and Yuan Xie. 2021. Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications. In 2021 IEEE\/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9."},{"key":"e_1_3_2_1_47_1","volume-title":"GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware. arXiv preprint arXiv:2108.11571","author":"Liu Xin","year":"2021","unstructured":"Xin Liu , Mingyu Yan , Shuhan Song , Zhengyang Lv , Wenming Li , Guangyu Sun , Xiaochun Ye , and Dongrui Fan . 2021. GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware. arXiv preprint arXiv:2108.11571 ( 2021 ). Xin Liu, Mingyu Yan, Shuhan Song, Zhengyang Lv, Wenming Li, Guangyu Sun, Xiaochun Ye, and Dongrui Fan. 2021. GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware. arXiv preprint arXiv:2108.11571 (2021)."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3272010"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ESSCIRC.2019.8902909"},{"key":"e_1_3_2_1_50_1","volume-title":"Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In 2017 {USENIX} Annual Technical Conference ({USENIX} {ATC} 17). 195--207.","author":"Ma Lingxiao","year":"2017","unstructured":"Lingxiao Ma , Zhi Yang , Han Chen , Jilong Xue , and Yafei Dai . 2017 . Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In 2017 {USENIX} Annual Technical Conference ({USENIX} {ATC} 17). 195--207. Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In 2017 {USENIX} Annual Technical Conference ({USENIX} {ATC} 17). 195--207."},{"key":"e_1_3_2_1_51_1","volume-title":"Proceedings of 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19)","author":"Ma Lingxiao","year":"2019","unstructured":"Lingxiao Ma , Zhi Yang , Youshan Miao , Jilong Xue , Ming Wu , Lidong Zhou , and Yafei Dai . 2019 . Neugraph: parallel deep neural network computation on large graphs . In Proceedings of 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19) . 443--458. Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In Proceedings of 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 443--458."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00022"},{"key":"e_1_3_2_1_53_1","volume-title":"Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. arXiv preprint arXiv:2103.03330","author":"Min Seung Won","year":"2021","unstructured":"Seung Won Min , Kun Wu , Sitao Huang , Mert Hidayeto\u011flu , Jinjun Xiong , Eiman Ebrahimi , Deming Chen , and Wen-mei Hwu. 2021. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. arXiv preprint arXiv:2103.03330 ( 2021 ). Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayeto\u011flu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu. 2021. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. arXiv preprint arXiv:2103.03330 (2021)."},{"key":"e_1_3_2_1_54_1","unstructured":"MVAPICH 2022. MVAPICH: MPI over InfiniBand Omni-Path Ethernet\/iWARP and RoCE. https:\/\/mvapich.cse.ohio-state.edu\/benchmarks\/  MVAPICH 2022. MVAPICH: MPI over InfiniBand Omni-Path Ethernet\/iWARP and RoCE. https:\/\/mvapich.cse.ohio-state.edu\/benchmarks\/"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731694"},{"key":"e_1_3_2_1_56_1","unstructured":"NVT4 2022. NVIDIA T4. https:\/\/www.nvidia.com\/en-us\/data-center\/tesla-t4\/  NVT4 2022. NVIDIA T4. https:\/\/www.nvidia.com\/en-us\/data-center\/tesla-t4\/"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001155"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_3_2_1_59_1","unstructured":"RISC-V 2022. T-head E906 RISC-V core. https:\/\/www.t-head.cn\/product\/E906  RISC-V 2022. T-head E906 RISC-V core. https:\/\/www.t-head.cn\/product\/E906"},{"key":"e_1_3_2_1_60_1","volume-title":"Ahren Yiqiao Jin, and Song-Chun Zhu","author":"Shi Feng","year":"2021","unstructured":"Feng Shi , Ahren Yiqiao Jin, and Song-Chun Zhu . 2021 . VersaGNN: a Versatile accelerator for Graph neural networks. arXiv preprint arXiv:2105.01280 (2021). Feng Shi, Ahren Yiqiao Jin, and Song-Chun Zhu. 2021. VersaGNN: a Versatile accelerator for Graph neural networks. arXiv preprint arXiv:2105.01280 (2021)."},{"key":"e_1_3_2_1_61_1","unstructured":"SmartConnect 2022. Xilinx LogiCORE IP AXI SmartConnect. https:\/\/www.xilinx.com\/products\/intellectual-property\/smartconnect.html  SmartConnect 2022. Xilinx LogiCORE IP AXI SmartConnect. https:\/\/www.xilinx.com\/products\/intellectual-property\/smartconnect.html"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00068"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00100"},{"key":"e_1_3_2_1_64_1","unstructured":"VPU 2022. An opensource VPU implementation. https:\/\/github.com\/alibaba\/vector-accelerating-unit  VPU 2022. An opensource VPU implementation. https:\/\/github.com\/alibaba\/vector-accelerating-unit"},{"key":"e_1_3_2_1_65_1","unstructured":"VU13P 2022. Xilinx Virtex UltraScale+ VU13P devices. https:\/\/www.xilinx.com\/products\/silicon-devices\/fpga\/virtex-ultrascale-plus.html  VU13P 2022. Xilinx Virtex UltraScale+ VU13P devices. https:\/\/www.xilinx.com\/products\/silicon-devices\/fpga\/virtex-ultrascale-plus.html"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456229"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"key":"e_1_3_2_1_68_1","volume-title":"Proceedings of 2020 USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Wang Yuke","year":"2020","unstructured":"Yuke Wang , Boyuan Feng , Gushu Li , Shuangchen Li , Lei Deng , Yuan Xie , and Yufei Ding . 2020 . Gnnadvisor: An efficient runtime system for gnn acceleration on gpus . In Proceedings of 2020 USENIX Symposium on Operating Systems Design and Implementation (OSDI). Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2020. Gnnadvisor: An efficient runtime system for gnn acceleration on gpus. In Proceedings of 2020 USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00012"},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISLPED.2019.8824832"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358318"},{"key":"e_1_3_2_1_72_1","volume-title":"SpZip: Architectural Support for Effective Data Compression In Irregular Applications. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1069--1082","author":"Yang Yifan","year":"2021","unstructured":"Yifan Yang , Joel S Emer , and Daniel Sanchez . 2021 . SpZip: Architectural Support for Effective Data Compression In Irregular Applications. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1069--1082 . Yifan Yang, Joel S Emer, and Daniel Sanchez. 2021. SpZip: Architectural Support for Effective Data Compression In Irregular Applications. In 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1069--1082."},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00043"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jpca.0c03201"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375312"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM51124.2021.00012"},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP49362.2020.00019"},{"key":"e_1_3_2_1_79_1","unstructured":"Dalong Zhang Xin Huang Ziqi Liu Zhiyang Hu Xianzheng Song Zhibang Ge Zhiqiang Zhang Lin Wang Jun Zhou Yang Shuang etal 2020. Agl: a scalable system for industrial-purpose graph machine learning. arXiv preprint arXiv:2003.02454 (2020).  Dalong Zhang Xin Huang Ziqi Liu Zhiyang Hu Xianzheng Song Zhibang Ge Zhiqiang Zhang Lin Wang Jun Zhou Yang Shuang et al. 2020. Agl: a scalable system for industrial-purpose graph machine learning. arXiv preprint arXiv:2003.02454 (2020)."},{"key":"e_1_3_2_1_80_1","first-page":"5165","article-title":"Link prediction based on graph neural networks","volume":"31","author":"Zhang Muhan","year":"2018","unstructured":"Muhan Zhang and Yixin Chen . 2018 . Link prediction based on graph neural networks . Advances in Neural Information Processing Systems 31 (2018), 5165 -- 5175 . Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. Advances in Neural Information Processing Systems 31 (2018), 5165--5175.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00030"},{"key":"e_1_3_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330686"},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/IA351965.2020.00011"},{"key":"e_1_3_2_1_84_1","volume-title":"GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing. arXiv preprint arXiv:2111.00680","author":"Zhou Zhe","year":"2021","unstructured":"Zhe Zhou , Cong Li , Xuechao Wei , and Guangyu Sun . 2021. GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing. arXiv preprint arXiv:2111.00680 ( 2021 ). Zhe Zhou, Cong Li, Xuechao Wei, and Guangyu Sun. 2021. GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing. arXiv preprint arXiv:2111.00680 (2021)."},{"key":"e_1_3_2_1_85_1","volume-title":"BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices. In 2021 58th ACM\/IEEE Design Automation Conference (DAC). IEEE, 1009--1014","author":"Zhou Zhe","year":"2021","unstructured":"Zhe Zhou , Bizhao Shi , Zhe Zhang , Yijin Guan , Guangyu Sun , and Guojie Luo . 2021 . BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices. In 2021 58th ACM\/IEEE Design Automation Conference (DAC). IEEE, 1009--1014 . Zhe Zhou, Bizhao Shi, Zhe Zhang, Yijin Guan, Guangyu Sun, and Guojie Luo. 2021. BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices. In 2021 58th ACM\/IEEE Design Automation Conference (DAC). IEEE, 1009--1014."},{"key":"e_1_3_2_1_86_1","volume-title":"Graph neural networks with generated parameters for relation extraction. arXiv preprint arXiv:1902.00756","author":"Zhu Hao","year":"2019","unstructured":"Hao Zhu , Yankai Lin , Zhiyuan Liu , Jie Fu , Tat-Seng Chua , and Maosong Sun . 2019. Graph neural networks with generated parameters for relation extraction. arXiv preprint arXiv:1902.00756 ( 2019 ). Hao Zhu, Yankai Lin, Zhiyuan Liu, Jie Fu, Tat-Seng Chua, and Maosong Sun. 2019. Graph neural networks with generated parameters for relation extraction. arXiv preprint arXiv:1902.00756 (2019)."},{"key":"e_1_3_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352127"}],"event":{"name":"ISCA '22: The 49th Annual International Symposium on Computer Architecture","location":"New York New York","acronym":"ISCA '22","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IEEE CS TCAA IEEE CS technical committee on architectural acoustics"]},"container-title":["Proceedings of the 49th Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527439","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3470496.3527439","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:54Z","timestamp":1750191534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527439"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,11]]},"references-count":87,"alternative-id":["10.1145\/3470496.3527439","10.1145\/3470496"],"URL":"https:\/\/doi.org\/10.1145\/3470496.3527439","relation":{},"subject":[],"published":{"date-parts":[[2022,6,11]]},"assertion":[{"value":"2022-06-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}