{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T23:03:10Z","timestamp":1780614190816,"version":"3.54.1"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>Graph Neural Networks (GNNs) have gained significant popularity for learning representations of graph-structured data. Mainstream GNNs employ the message passing scheme that iteratively propagates information between connected nodes through edges. However, this scheme incurs high training costs, hindering the applicability of GNNs on large graphs. Recently, the database community has extensively researched effective solutions to facilitate efficient GNN training on massive graphs. In this tutorial, we provide a comprehensive overview of the GNN training process based on the graph data lifecycle, covering graph preprocessing, batch generation, data transfer, and model training stages. We discuss recent data management efforts aiming at accelerating individual stages or improving the overall training efficiency. Recognizing the distinct training issues associated with static and dynamic graphs, we first focus on efficient GNN training on static graphs, followed by an exploration of training GNNs on dynamic graphs. Finally, we suggest some potential research directions in this area. We believe this tutorial is valuable for researchers and practitioners to understand the bottleneck of GNN training and the advanced data management techniques to accelerate the training of different GNNs on massive graphs in diverse hardware settings.<\/jats:p>","DOI":"10.14778\/3685800.3685844","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4237-4240","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Efficient Training of Graph Neural Networks on Large Graphs"],"prefix":"10.14778","volume":"17","author":[{"given":"Yanyan","family":"Shen","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"HKUST, HKUST(GZ)"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jingzhi","family":"Fang","sequence":"additional","affiliation":[{"name":"HKUST"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xin","family":"Zhang","sequence":"additional","affiliation":[{"name":"HKUST"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shihong","family":"Gao","sequence":"additional","affiliation":[{"name":"HKUST"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hongbo","family":"Yin","sequence":"additional","affiliation":[{"name":"HKUST(GZ)"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments. arXiv:2311.13225","author":"Ai Xin","year":"2023","unstructured":"Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, and Ge Yu. 2023. NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments. arXiv:2311.13225 (2023)."},{"key":"e_1_2_1_2_1","volume-title":"Company-as-tribe: Company financial risk assessment on tribe-style graph with hierarchical graph neural networks. In KDD. 2712--2720.","author":"Bi Wendong","year":"2022","unstructured":"Wendong Bi, Bingbing Xu, Xiaoqian Sun, Zidong Wang, Huawei Shen, and Xueqi Cheng. 2022. Company-as-tribe: Company financial risk assessment on tribe-style graph with hierarchical graph neural networks. In KDD. 2712--2720."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456233"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577528"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Venkatesan T Chakaravarthy Shivmaran S Pandian Saurabh Raje Yogish Sabharwal Toyotaro Suzumura and Shashanka Ubaru. 2021. Efficient scaling of dynamic graph neural networks. In SC. 1--15.","DOI":"10.1145\/3458817.3480858"},{"key":"e_1_2_1_6_1","volume-title":"Exgc: Bridging efficiency and explainability in graph condensation. arXiv preprint arXiv:2402.05962","author":"Fang Junfeng","year":"2024","unstructured":"Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xiang Wang, and Xiangnan He. 2024. Exgc: Bridging efficiency and explainability in graph condensation. arXiv preprint arXiv:2402.05962 (2024)."},{"key":"e_1_2_1_7_1","first-page":"2734","article-title":"Optimizing DNN computation graph using graph substitutions","volume":"13","author":"Fang Jingzhi","year":"2020","unstructured":"Jingzhi Fang, Yanyan Shen, Yue Wang, and Lei Chen. 2020. Optimizing DNN computation graph using graph substitutions. VLDB 13, 12 (2020), 2734--2746.","journal-title":"VLDB"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3639323","article-title":"STile: Searching Hybrid Sparse Formats for Sparse Deep Learning Operators Automatically","volume":"2","author":"Fang Jingzhi","year":"2024","unstructured":"Jingzhi Fang, Yanyan Shen, Yue Wang, and Lei Chen. 2024. STile: Searching Hybrid Sparse Formats for Sparse Deep Learning Operators Automatically. Proceedings of the ACM on Management of Data 2, 1 (2024), 1--26.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_2_1_9_1","unstructured":"Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In OSDI. 551--568."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Chen Gao Xiang Wang Xiangnan He and Yong Li. 2022. Graph neural networks for recommender system. In WSDM. 1623--1625.","DOI":"10.1145\/3488560.3501396"},{"key":"e_1_2_1_11_1","first-page":"1060","article-title":"ETC: Efficient Training of Temporal Graph Neural Networks over Large-scale Dynamic Graphs","volume":"17","author":"Gao Shihong","year":"2024","unstructured":"Shihong Gao, Yiming Li, Yanyan Shen, Yingxia Shao, and Lei Chen. 2024. ETC: Efficient Training of Temporal Graph Neural Networks over Large-scale Dynamic Graphs. VLDB 17, 5 (2024), 1060--1072.","journal-title":"VLDB"},{"key":"e_1_2_1_12_1","first-page":"1","article-title":"SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement","volume":"2","author":"Gao Shihong","year":"2024","unstructured":"Shihong Gao, Yiming Li, Xin Zhang, Yanyan Shen, Yingxia Shao, and Lei Chen. 2024. SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement. Proceedings of the ACM on Management of Data 2, 3 (2024), 1--25.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_2_1_13_1","first-page":"1119","article-title":"Traversing large graphs on GPUs with unified memory","volume":"13","author":"Gera Prasun","year":"2020","unstructured":"Prasun Gera, Hyojong Kim, Piyush Sao, Hyesoon Kim, and David Bader. 2020. Traversing large graphs on GPUs with unified memory. VLDB 13, 7 (2020), 1119--1133.","journal-title":"VLDB"},{"key":"e_1_2_1_14_1","first-page":"22118","article-title":"Open graph benchmark: Datasets for machine learning on graphs","volume":"33","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. NeurIPS 33 (2020), 22118--22133.","journal-title":"NeurIPS"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580516"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Abhinav Jangda Sandeep Polisetty Arjun Guha and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using GPUs. In EuroSys. 311--326.","DOI":"10.1145\/3447786.3456244"},{"key":"e_1_2_1_17_1","first-page":"187","article-title":"Improving the accuracy, scalability, and performance of graph neural networks with roc","volume":"2","author":"Jia Zhihao","year":"2020","unstructured":"Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. MLSys 2 (2020), 187--198.","journal-title":"MLSys"},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Zhihao Jia Sina Lin Rex Ying Jiaxuan You Jure Leskovec and Alex Aiken. 2020. Redundancy-free computation for graph neural networks. In KDD. 997--1005.","DOI":"10.1145\/3394486.3403142"},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Xunqiang Jiang Tianrui Jia Yuan Fang Chuan Shi Zhe Lin and Hui Wang. 2021. Pre-training on large-scale heterogeneous graph. In KDD. 756--766.","DOI":"10.1145\/3447548.3467396"},{"key":"e_1_2_1_20_1","volume-title":"Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907","author":"Kipf Thomas N","year":"2016","unstructured":"Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Haoyang Li and Lei Chen. 2021. Cache-based gnn system for dynamic graphs. In CIKM. 937--946.","DOI":"10.1145\/3459637.3482237"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588737"},{"key":"e_1_2_1_23_1","first-page":"1332","article-title":"Zebra: When Temporal Graph Neural Networks Meet Temporal Personalized PageRank","volume":"16","author":"Li Yiming","year":"2023","unstructured":"Yiming Li, Yanyan Shen, Lei Chen, and Mingxuan Yuan. 2023. Zebra: When Temporal Graph Neural Networks Meet Temporal Personalized PageRank. VLDB 16, 6 (2023), 1332--1345.","journal-title":"VLDB"},{"key":"e_1_2_1_24_1","volume-title":"Cc-gnn: A community and contraction-based graph neural network","author":"Li Zhiyuan","year":"2022","unstructured":"Zhiyuan Li, Xun Jian, Yue Wang, and Lei Chen. 2022. Cc-gnn: A community and contraction-based graph neural network. In ICDM. IEEE, 231--240."},{"key":"e_1_2_1_25_1","first-page":"1364","article-title":"DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning","volume":"17","author":"Li Zhiyuan","year":"2024","unstructured":"Zhiyuan Li, Xun Jian, Yue Wang, Yingxia Shao, and Lei Chen. 2024. DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning. VLDB 17, 6 (2024), 1364--1376. https:\/\/www.vldb.org\/pvldb\/vol17\/p1364-li.pdf","journal-title":"VLDB"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421281"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538614"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526134"},{"key":"e_1_2_1_29_1","unstructured":"Yuke Wang Boyuan Feng Gushu Li Shuangchen Li Lei Deng Yuan Xie and Yufei Ding. 2021. GNNAdvisor: An adaptive and efficient runtime system for GNN acceleration on GPUs. In OSDI. 515--531."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Zhe Xu Yuzhong Chen Menghai Pan Huiyuan Chen Mahashweta Das Hao Yang and Hanghang Tong. 2023. Kernel Ridge Regression-Based Graph Dataset Distillation. In KDD. 2850--2861.","DOI":"10.1145\/3580305.3599398"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Rui Xue Haoyu Han Tong Zhao Neil Shah Jiliang Tang and Xiaorui Liu. 2023. Large-Scale Graph Neural Networks: The Past and New Frontiers. In KDD. 5835--5836.","DOI":"10.1145\/3580305.3599565"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Jianbang Yang Dahai Tang Xiaoniu Song Lei Wang Qiang Yin Rong Chen Wenyuan Yu and Jingren Zhou. 2022. GNNLab: a factored system for sample-based GNN training over GPUs. In EuroSys. 417--434.","DOI":"10.1145\/3492321.3519557"},{"key":"e_1_2_1_33_1","volume-title":"Feature-Oriented Sampling for Fast and Scalable GNN Training","author":"Zhang Xin","unstructured":"Xin Zhang, Yanyan Shen, and Lei Chen. 2022. Feature-Oriented Sampling for Fast and Scalable GNN Training. In ICDM. IEEE, 723--732."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589311"},{"key":"e_1_2_1_35_1","volume-title":"NSCaching: simple and efficient negative sampling for knowledge graph embedding","author":"Zhang Yongqi","unstructured":"Yongqi Zhang, Quanming Yao, Yingxia Shao, and Lei Chen. 2019. NSCaching: simple and efficient negative sampling for knowledge graph embedding. In ICDE. IEEE, 614--625."},{"key":"e_1_2_1_36_1","volume-title":"Learning on large-scale text-attributed graphs via variational inference. arXiv preprint arXiv:2210.14709","author":"Zhao Jianan","year":"2022","unstructured":"Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. 2022. Learning on large-scale text-attributed graphs via variational inference. arXiv preprint arXiv:2210.14709 (2022)."},{"key":"e_1_2_1_37_1","volume-title":"Xingquan Zhu, and Shirui Pan.","author":"Zheng Xin","year":"2024","unstructured":"Xin Zheng, Miao Zhang, Chunyang Chen, Quoc Viet Hung Nguyen, Xingquan Zhu, and Shirui Pan. 2024. Structure-free graph condensation: From large-scale graphs to condensed graph-free data. NeurIPS 36 (2024)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685844","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:29:47Z","timestamp":1735622987000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685844"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":37,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685844"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685844","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}