{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:03:50Z","timestamp":1775815430976,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62172008"],"award-info":[{"award-number":["62172008"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Natural Science Fund for the Excellent Young Scientists Fund Program"},{"name":"Hong Kong RGC","award":["TRS-T41-603\/20-R, GRF-16213621"],"award-info":[{"award-number":["TRS-T41-603\/20-R, GRF-16213621"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,6,13]]},"abstract":"<jats:p>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.<\/jats:p>","DOI":"10.1145\/3589288","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T20:26:45Z","timestamp":1687292805000},"page":"1-23","source":"Crossref","is-referenced-by-count":43,"title":["Scalable and Efficient Full-Graph GNN Training for Large Graphs"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6503-5309","authenticated-orcid":false,"given":"Xinchen","family":"Wan","sequence":"first","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0501-5968","authenticated-orcid":false,"given":"Kaiqiang","family":"Xu","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8380-1879","authenticated-orcid":false,"given":"Xudong","family":"Liao","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9502-7622","authenticated-orcid":false,"given":"Yilun","family":"Jin","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2587-6028","authenticated-orcid":false,"given":"Kai","family":"Chen","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8741-5847","authenticated-orcid":false,"given":"Xin","family":"Jin","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Alvaro Sanchez-Gonzalez, Jacklynn Stott, Shantanu Thakoor, and Petar Velickovic.","author":"Addanki Ravichandra","year":"2021","unstructured":"Ravichandra Addanki, Peter W. Battaglia, David Budden, Andreea Deac, Jonathan Godwin, Thomas Keck, Wai Lok Sibon Li, Alvaro Sanchez-Gonzalez, Jacklynn Stott, Shantanu Thakoor, and Petar Velickovic. 2021. Large-scale graph representation learning with very deep GNNs and self-supervision. arxiv: cs.LG\/2107.09422"},{"key":"e_1_2_2_2_1","unstructured":"Alibaba. 2020. Euler. https:\/\/github.com\/alibaba\/euler"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456233"},{"key":"e_1_2_2_6_1","volume-title":"Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247","author":"Chen Jie","year":"2018","unstructured":"Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018)."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330925"},{"key":"e_1_2_2_8_1","volume-title":"Dearing and Xiaoyan Wang","author":"Matthew","year":"2021","unstructured":"Matthew T. Dearing and Xiaoyan Wang. 2021. Analyzing the Performance of Graph Neural Networks with Pipe Parallelism. (2021). arxiv: cs.LG\/2012.10840"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2012.01.020"},{"key":"e_1_2_2_10_1","volume-title":"Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428","author":"Fey Matthias","year":"2019","unstructured":"Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019)."},{"key":"e_1_2_2_11_1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Gandhi Swapnil","year":"2021","unstructured":"Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed Deep Graph Learning at Scale. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 551--568."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41404.2022.00077"},{"key":"e_1_2_2_13_1","unstructured":"Will Hamilton Zhitao Ying and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034."},{"key":"e_1_2_2_14_1","volume-title":"Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687 (2020)."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437801.3441585"},{"key":"e_1_2_2_16_1","unstructured":"Yanping Huang Youlong Cheng Ankur Bapna Orhan Firat Mia Chen Dehao Chen HyoukJoong Lee Jiquan Ngiam Quoc V. Le Yonghui Wu and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. https:\/\/arxiv.org\/pdf\/1811.06965"},{"key":"e_1_2_2_17_1","volume-title":"Proceedings of Machine Learning and Systems (MLSys)","author":"Jia Zhihao","year":"2020","unstructured":"Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems (MLSys) (2020), 187--198."},{"key":"e_1_2_2_18_1","first-page":"172","article-title":"Accelerating training and inference of graph neural networks with fast sampling and pipelining","volume":"4","author":"Kaler Tim","year":"2022","unstructured":"Tim Kaler, Nickolas Stathas, Anne Ouyang, Alexandros-Stavros Iliopoulos, Tao Schardl, Charles E Leiserson, and Jie Chen. 2022. Accelerating training and inference of graph neural networks with fast sampling and pipelining. Proceedings of Machine Learning and Systems, Vol. 4 (2022), 172--189.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLOUD49709.2020.00028"},{"key":"e_1_2_2_20_1","series-title":"SIAM Journal on scientific Computing","volume-title":"1998 a. A fast and high quality multilevel scheme for partitioning irregular graphs","author":"Karypis George","year":"1998","unstructured":"George Karypis and Vipin Kumar. 1998 a. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing, Vol. 20, 1 (1998), 359--392."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/509058.509086"},{"key":"e_1_2_2_22_1","volume-title":"Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907","author":"Kipf Thomas N","year":"2016","unstructured":"Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)."},{"key":"e_1_2_2_23_1","volume-title":"Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012), 1097--1105."},{"key":"e_1_2_2_24_1","volume-title":"International conference on machine learning. PMLR, 6437--6449","author":"Li Guohao","year":"2021","unstructured":"Guohao Li, Matthias M\u00fcller, Bernard Ghanem, and Vladlen Koltun. 2021. Training graph neural networks with 1000 layers. In International conference on machine learning. PMLR, 6437--6449."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2640087.2644155"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421281"},{"key":"e_1_2_2_27_1","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Ma Lingxiao","year":"2019","unstructured":"Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 443--458."},{"key":"e_1_2_2_28_1","unstructured":"Hesham Mostafa. 2021. Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs. arxiv: cs.LG\/2111.06483"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538614"},{"key":"e_1_2_2_31_1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Thorpe John","year":"2021","unstructured":"John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, et al. 2021. Dorylus: affordable, scalable, and accurate GNN training with distributed CPU servers and serverless threads. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 495--514."},{"key":"e_1_2_2_32_1","volume-title":"Graph attention networks. arXiv preprint arXiv:1710.10903","author":"Petar Velivc","year":"2017","unstructured":"Petar Velivc kovi\u0107, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)."},{"key":"e_1_2_2_33_1","volume-title":"Nam Sung Kim, and Yingyan Lin","author":"Wan Cheng","year":"2022","unstructured":"Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, and Yingyan Lin. 2022b. BNS-GCN: Efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proceedings of Machine Learning and Systems, Vol. 4 (2022)."},{"key":"e_1_2_2_34_1","volume-title":"Nam Sung Kim, and Yingyan Lin","author":"Wan Cheng","year":"2022","unstructured":"Cheng Wan, Youjie Li, Cameron R Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin. 2022c. Pipegcn: Efficient full-graph training of graph convolutional networks with pipelined feature communication. arXiv preprint arXiv:2203.10428 (2022)."},{"key":"e_1_2_2_35_1","volume-title":"DGS: Communication-Efficient Graph Sampling for Distributed GNN Training. In 2022 IEEE 30th International Conference on Network Protocols (ICNP). IEEE, 1--11","author":"Wan Xinchen","year":"2022","unstructured":"Xinchen Wan, Kai Chen, and Yiming Zhang. 2022a. DGS: Communication-Efficient Graph Sampling for Distributed GNN Training. In 2022 IEEE 30th International Conference on Network Protocols (ICNP). IEEE, 1--11."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411029.3411037"},{"key":"e_1_2_2_37_1","volume-title":"Domain-specific communication optimization for distributed DNN training. arXiv preprint arXiv:2008.08445","author":"Wang Hao","year":"2020","unstructured":"Hao Wang, Jingrong Chen, Xinchen Wan, Han Tian, Jiacheng Xia, Gaoxiong Zeng, Weiyan Wang, Kai Chen, Wei Bai, and Junchen Jiang. 2020a. Domain-specific communication optimization for distributed DNN training. arXiv preprint arXiv:2008.08445 (2020)."},{"key":"e_1_2_2_38_1","unstructured":"Minjie Wang Da Zheng Zihao Ye Quan Gan Mufei Li Xiang Song Jinjing Zhou Chao Ma Lingfan Yu Yu Gai Tianjun Xiao Tong He George Karypis Jinyang Li and Zheng Zhang. 2020b. Deep Graph Library: A Graph-Centric Highly-Performant Package for Graph Neural Networks. arxiv: cs.LG\/1909.01315"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526134"},{"key":"e_1_2_2_40_1","volume-title":"GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Wang Yuke","year":"2021","unstructured":"Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 515--531."},{"key":"e_1_2_2_41_1","volume-title":"Efficient dnn training with knowledge-guided layer freezing. arXiv preprint arXiv:2201.06227","author":"Wang Yiding","year":"2022","unstructured":"Yiding Wang, Decang Sun, Kai Chen, Fan Lai, and Mosharaf Chowdhury. 2022a. Efficient dnn training with knowledge-guided layer freezing. arXiv preprint arXiv:2201.06227 (2022)."},{"key":"e_1_2_2_42_1","volume-title":"International conference on machine learning. PMLR, 6861--6871","author":"Wu Felix","year":"2019","unstructured":"Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning. PMLR, 6861--6871."},{"key":"e_1_2_2_43_1","unstructured":"Keyulu Xu Weihua Hu Jure Leskovec and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?arxiv: cs.LG\/1810.00826"},{"key":"e_1_2_2_44_1","volume-title":"Tacc: A full-stack cloud computing infrastructure for machine learning tasks. arXiv preprint arXiv:2110.01556","author":"Xu Kaiqiang","year":"2021","unstructured":"Kaiqiang Xu, Xinchen Wan, Hao Wang, Zhenghang Ren, Xudong Liao, Decang Sun, Chaoliang Zeng, and Kai Chen. 2021. Tacc: A full-stack cloud computing infrastructure for machine learning tasks. arXiv preprint arXiv:2110.01556 (2021)."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33017370"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_2_2_48_1","volume-title":"Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931","author":"Zeng Hanqing","year":"2019","unstructured":"Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019a. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931 (2019)."},{"key":"e_1_2_2_49_1","volume-title":"Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931","author":"Zeng Hanqing","year":"2019","unstructured":"Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019b. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931 (2019)."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415539"},{"key":"e_1_2_2_51_1","volume-title":"DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. arXiv preprint arXiv:2010.05337","author":"Zheng Da","year":"2020","unstructured":"Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. arXiv preprint arXiv:2010.05337 (2020)."},{"key":"e_1_2_2_52_1","volume-title":"Aligraph: A comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730","author":"Zhu Rong","year":"2019","unstructured":"Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. Aligraph: A comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730 (2019)."},{"key":"e_1_2_2_53_1","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16)","author":"Zhu Xiaowei","year":"2016","unstructured":"Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 301--316."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589288","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589288","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:54Z","timestamp":1750182534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":53,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,13]]}},"alternative-id":["10.1145\/3589288"],"URL":"https:\/\/doi.org\/10.1145\/3589288","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,13]]}}}