{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T09:53:43Z","timestamp":1773482023949,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T00:00:00Z","timestamp":1701993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62072083, U2241212 and 62072082"],"award-info":[{"award-number":["62072083, U2241212 and 62072082"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"the Fundamental Research Funds for the Central Universities","award":["N2216017"],"award-info":[{"award-number":["N2216017"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,12,8]]},"abstract":"<jats:p>Distributed computing is promising to enable large-scale graph neural network (GNN) model training. However, care is needed to avoid excessive computational and communication overheads. Sampling is promising in terms of enabling scalability, and sampling techniques have been proposed to reduce training costs. However, online sampling introduces large overheads, and while offline sampling that is done only once can eliminate such overheads, it instead introduces information loss and accuracy degradation. Thus, existing sampling techniques are unable to improve simultaneously both efficiency and accuracy, particularly at low sampling rates. We develop a distributed system, ADGNN, for full-batch based GNN training that adopts a hybrid sampling architecture to enable a trade-off between efficiency and accuracy. Specifically, ADGNN employs sampling result reuse techniques to reduce the cost associated with sampling and thus improve training efficiency. To alleviate accuracy degradation, we introduce a new metric,Aggregation Difference (AD), that quantifies the gap between sampled and full neighbor set aggregation. We present so-called AD-Sampling that aims to minimize the Aggregation Difference with an adaptive sampling frequency tuner. Finally, ADGNN employs anAD -importance-based sampling technique for remote neighbors to further reduce communication costs. Experiments on five real datasets show that ADGNN is able to outperform the state-of-the-art by up to nearly 9 times in terms of efficiency, while achieving comparable accuracy to the non-sampling methods.<\/jats:p>","DOI":"10.1145\/3626716","type":"journal-article","created":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T14:01:21Z","timestamp":1702389681000},"page":"1-26","source":"Crossref","is-referenced-by-count":7,"title":["ADGNN: Towards Scalable GNN Training with Aggregation-Difference Aware Sampling"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8737-2727","authenticated-orcid":false,"given":"Zhen","family":"Song","sequence":"first","affiliation":[{"name":"Northeastern University, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7422-6254","authenticated-orcid":false,"given":"Yu","family":"Gu","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5424-6442","authenticated-orcid":false,"given":"Tianyi","family":"Li","sequence":"additional","affiliation":[{"name":"Aalborg University, Aalborg, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2365-9766","authenticated-orcid":false,"given":"Qing","family":"Sun","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9871-0304","authenticated-orcid":false,"given":"Yanfeng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9697-7670","authenticated-orcid":false,"given":"Christian S.","family":"Jensen","sequence":"additional","affiliation":[{"name":"Aalborg University, Aalborg, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3171-8889","authenticated-orcid":false,"given":"Ge","family":"Yu","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, China"}]}],"member":"320","published-online":{"date-parts":[[2023,12,12]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Jiyang Bai Yuxiang Ren and Jiawei Zhang. 2021. Ripple Walk Training: A Subgraph-based Training Framework for Large and Deep Graph Neural Network. In IJCNN. 1--8.","DOI":"10.1109\/IJCNN52387.2021.9533429"},{"key":"e_1_2_2_2_1","volume-title":"cC ataly\u00fc rek","author":"Balin Muhammed Fatih","year":"2022","unstructured":"Muhammed Fatih Balin and \u00dc mit V. cC ataly\u00fc rek. 2022. (LA)yer-neigh(BOR) Sampling: Defusing Neighborhood Explosion in GNNs. CoRR, Vol. abs\/2210.13339 (2022)."},{"key":"e_1_2_2_3_1","volume-title":"Fei Wang, and Hao Yang.","author":"Chen Huiyuan","year":"2022","unstructured":"Huiyuan Chen, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2022. Graph Neural Transport Networks with Non-local Attentions for Recommender Systems. In WWW. 1955--1964."},{"key":"e_1_2_2_4_1","unstructured":"Jie Chen Tengfei Ma and Cao Xiao. 2018a. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In ICLR."},{"key":"e_1_2_2_5_1","unstructured":"Jianfei Chen Jun Zhu and Le Song. 2018b. Stochastic Training of Graph Convolutional Networks with Variance Reduction. In ICML. 941--949."},{"key":"e_1_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Wei-Lin Chiang Xuanqing Liu Si Si Yang Li Samy Bengio and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In KDD. 257--266.","DOI":"10.1145\/3292500.3330925"},{"key":"e_1_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Weilin Cong Rana Forsati Mahmut T. Kandemir and Mehrdad Mahdavi. 2020. Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks. In KDD. 1393--1403.","DOI":"10.1145\/3394486.3403192"},{"key":"e_1_2_2_8_1","first-page":"1114","article-title":"Learning Steady-States of Iterative Algorithms over Graphs","volume":"80","author":"Dai Hanjun","year":"2018","unstructured":"Hanjun Dai, Zornitsa Kozareva, Bo Dai, Alexander J. Smola, and Le Song. 2018. Learning Steady-States of Iterative Algorithms over Graphs. In ICML, Vol. 80. 1114--1122.","journal-title":"ICML"},{"key":"e_1_2_2_9_1","unstructured":"Micha\u00eb l Defferrard Xavier Bresson and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In NeurIPS. 3837--3845."},{"key":"e_1_2_2_10_1","volume-title":"Fast Graph Representation Learning with PyTorch Geometric. CoRR","author":"Fey Matthias","year":"2019","unstructured":"Matthias Fey and Jan Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. CoRR, Vol. abs\/1903.02428 (2019)."},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Chen Gao Xiang Wang Xiangnan He and Yong Li. 2022b. Graph Neural Networks for Recommender System. In WSDM. 1623--1625.","DOI":"10.1145\/3488560.3501396"},{"key":"e_1_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Yunjun Gao Xiaoze Liu Junyang Wu Tianyi Li Pengfei Wang and Lu Chen. 2022a. ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities. In KDD. 421--431.","DOI":"10.1145\/3534678.3539331"},{"key":"e_1_2_2_13_1","first-page":"3182","article-title":"Distributed Hypergraph Processing Using Intersection Graphs","volume":"34","author":"Gu Yu","year":"2022","unstructured":"Yu Gu, Kaiqiang Yu, Zhen Song, Jianzhong Qi, Zhigang Wang, Ge Yu, and Rui Zhang. 2022. Distributed Hypergraph Processing Using Intersection Graphs. IEEE Trans. Knowl. Data Eng., Vol. 34, 7 (2022), 3182--3195.","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"e_1_2_2_14_1","unstructured":"William L. Hamilton Zhitao Ying and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NeurIPS. 1024--1034."},{"key":"e_1_2_2_15_1","first-page":"22118","article-title":"Open Graph Benchmark: Datasets for Machine Learning on Graphs","volume":"33","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In NeurIPS, Vol. 33. 22118--22133.","journal-title":"NeurIPS"},{"key":"e_1_2_2_16_1","unstructured":"Wen-bing Huang Tong Zhang Yu Rong and Junzhou Huang. 2018. Adaptive Sampling Towards Fast Graph Representation Learning. In NeurIPS. 4563--4572."},{"key":"e_1_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Zijian Huang Meng-Fen Chiang and Wang-Chien Lee. 2022. LinE: Logical Query Reasoning over Hierarchical Knowledge Graphs. In KDD. 615--625.","DOI":"10.1145\/3534678.3539338"},{"key":"e_1_2_2_18_1","first-page":"187","article-title":"Improving the accuracy, scalability, and performance of graph neural networks with roc","volume":"2","author":"Jia Zhihao","year":"2020","unstructured":"Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems, Vol. 2, 187--198.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_2_2_19_1","doi-asserted-by":"crossref","unstructured":"George Karypis and Vipin Kumar. 1998. Multilevel k-way Partitioning Scheme for Irregular Graphs. J. Parallel Distributed Comput. (1998) 96--129.","DOI":"10.1006\/jpdc.1997.1404"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588737"},{"key":"e_1_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Xiaoze Liu Junyang Wu Tianyi Li Lu Chen and Yunjun Gao. 2023. Unsupervised Entity Alignment for Temporal Knowledge Graphs. In WWW. 2528--2538.","DOI":"10.1145\/3543507.3583381"},{"key":"e_1_2_2_22_1","unstructured":"Lingxiao Ma Zhi Yang Youshan Miao Jilong Xue Ming Wu Lidong Zhou and Yafei Dai. 2019. NeuGraph: Parallel Deep Neural Network Computation on Large Graphs. In ATC. 443--458."},{"key":"e_1_2_2_23_1","unstructured":"Qianwen Ma Chunyuan Yuan Wei Zhou and Songlin Hu. 2021. Label-Specific Dual Graph Neural Network for Multi-Label Text Classification. In ACL\/IJCNLP. 3855--3864."},{"key":"e_1_2_2_24_1","first-page":"1","article-title":"DistGNN: scalable distributed training for large-scale graph neural networks","volume":"76","author":"Md Vasimuddin","year":"2021","unstructured":"Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj D. Kalamkar, Nesreen K. Ahmed, and Sasikanth Avancha. 2021. DistGNN: scalable distributed training for large-scale graph neural networks. In SC. 76:1--76:14.","journal-title":"SC."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551819"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538614"},{"key":"e_1_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Jingshu Peng Yanyan Shen and Lei Chen. 2021. GraphANGEL: Adaptive aNd Structure-Aware Sampling on Graph NEuraL Networks. In ICDM. 479--488.","DOI":"10.1109\/ICDM51629.2021.00059"},{"key":"e_1_2_2_28_1","volume-title":"LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence. arXiv preprint arXiv:2302.00924","author":"Shi Zhihao","year":"2023","unstructured":"Zhihao Shi, Xize Liang, and Jie Wang. 2023. LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence. arXiv preprint arXiv:2302.00924 (2023)."},{"key":"e_1_2_2_29_1","first-page":"648","article-title":"EC-Graph","volume":"2022","author":"Song Zhen","year":"2022","unstructured":"Zhen Song, Yu Gu, Jianzhong Qi, Zhigang Wang, and Ge Yu. 2022a. EC-Graph: A Distributed Graph Neural Network System with Error-Compensated Compression. In ICDE 2022. 648--660.","journal-title":"In ICDE"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-021-0445-2"},{"key":"e_1_2_2_31_1","volume-title":"Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads. In OSDI. 495--514.","author":"Thorpe John","year":"2021","unstructured":"John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. 2021. Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads. In OSDI. 495--514."},{"key":"e_1_2_2_32_1","doi-asserted-by":"crossref","unstructured":"Alok Tripathy Katherine A. Yelick and Aydin Bulucc. 2020. Reducing communication in graph neural network training. In SC. 1--14.","DOI":"10.1109\/SC41405.2020.00074"},{"key":"e_1_2_2_33_1","unstructured":"Petar Velivc kovi\u0107 Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Lio and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903."},{"key":"e_1_2_2_34_1","volume-title":"Nam Sung Kim, and Yingyan Lin","author":"Wan Cheng","year":"2022","unstructured":"Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, and Yingyan Lin. 2022b. BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling. In MLSys. 673--693."},{"key":"e_1_2_2_35_1","volume-title":"DGS: Communication-Efficient Graph Sampling for Distributed GNN Training. In ICNP. 1--11.","author":"Wan Xinchen","year":"2022","unstructured":"Xinchen Wan, Kai Chen, and Yiming Zhang. 2022a. DGS: Communication-Efficient Graph Sampling for Distributed GNN Training. In ICNP. 1--11."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589288"},{"key":"e_1_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Qiange Wang Yanfeng Zhang Hao Wang Chaoyi Chen Xiaodong Zhang and Ge Yu. 2022. NeutronStar: Distributed GNN Training with Hybrid Dependency Management. In SIGMOD. 1301--1315.","DOI":"10.1145\/3514221.3526134"},{"key":"e_1_2_2_38_1","first-page":"1973","article-title":"HGraph: I\/O-Efficient Distributed and Iterative Graph Computing by Hybrid Pushing\/Pulling","volume":"33","author":"Wang Zhigang","year":"2021","unstructured":"Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, Jeffrey Xu Yu, and Zhiqiang Wei. 2021. HGraph: I\/O-Efficient Distributed and Iterative Graph Computing by Hybrid Pushing\/Pulling. IEEE Trans. Knowl. Data Eng., Vol. 33, 5 (2021), 1973--1987.","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"e_1_2_2_39_1","volume-title":"SEA: A Scalable Entity Alignment System. In SIGIR. 3175--3179.","author":"Wu Junyang","year":"2023","unstructured":"Junyang Wu, Tianyi Li, Lu Chen, Yunjun Gao, and Ziheng Wei. 2023. SEA: A Scalable Entity Alignment System. In SIGIR. 3175--3179."},{"key":"e_1_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Yirong Wu Jialin Chen and Tinglong Tang. 2022. Feature Enhanced Graph Neural Network for Few-Shot Image Classification. In CSCWD. 513--518.","DOI":"10.1109\/CSCWD54268.2022.9776302"},{"key":"e_1_2_2_41_1","doi-asserted-by":"crossref","unstructured":"Yifan Xing Tong He Tianjun Xiao Yongxin Wang Yuanjun Xiong Wei Xia David Wipf Zheng Zhang and Stefano Soatto. 2021. Learning Hierarchical Graph Neural Networks for Image Clustering. In ICCV. 3447--3457.","DOI":"10.1109\/ICCV48922.2021.00345"},{"key":"e_1_2_2_42_1","unstructured":"Keyulu Xu Weihua Hu Jure Leskovec and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826."},{"key":"e_1_2_2_43_1","doi-asserted-by":"crossref","unstructured":"Rex Ying Ruining He Kaifeng Chen Pong Eksombatchai William L. Hamilton and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In KDD. 974--983.","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_2_2_44_1","volume-title":"Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931.","author":"Zeng Hanqing","year":"2019","unstructured":"Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019a. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931."},{"key":"e_1_2_2_45_1","volume-title":"Prasanna","author":"Zeng Hanqing","year":"2019","unstructured":"Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2019b. Accurate, Efficient and Scalable Graph Embedding. In IPDPS. 462--471."},{"key":"e_1_2_2_46_1","first-page":"3125","article-title":"AGL","volume":"13","author":"Zhang Dalong","year":"2020","unstructured":"Dalong Zhang, Xin Huang, Ziqi Liu, Jun Zhou, Zhiyang Hu, Xianzheng Song, Zhibang Ge, Lin Wang, Zhiqiang Zhang, and Yuan Qi. 2020. AGL: A Scalable System for Industrial-purpose Graph Machine Learning. PVLDB, Vol. 13, 12 (2020), 3125--3137.","journal-title":"A Scalable System for Industrial-purpose Graph Machine Learning. PVLDB"},{"key":"e_1_2_2_47_1","unstructured":"Muhan Zhang and Yixin Chen. 2018. Link Prediction Based on Graph Neural Networks. In NeurIPS. 5171--5181."},{"key":"e_1_2_2_48_1","doi-asserted-by":"crossref","unstructured":"Xin Zhang Yanyan Shen and Lei Chen. 2022. Feature-Oriented Sampling for Fast and Scalable GNN Training. In ICDM. 723--732.","DOI":"10.1109\/ICDM54844.2022.00083"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589311"},{"key":"e_1_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Da Zheng Chao Ma Minjie Wang Jinjing Zhou Qidong Su Xiang Song Quan Gan Zheng Zhang and George Karypis. 2020. DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. In SC. 36--44.","DOI":"10.1109\/IA351965.2020.00011"},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Jason Zhu Yanling Cui Yuming Liu Hao Sun Xue Li Markus Pelger Tianqi Yang Liangjie Zhang Ruofei Zhang and Huasha Zhao. 2021. TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored Search. In WWW. 2848--2857.","DOI":"10.1145\/3442381.3449842"},{"key":"e_1_2_2_52_1","first-page":"2094","article-title":"AliGraph","volume":"12","author":"Zhu Rong","year":"2019","unstructured":"Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: A Comprehensive Graph Neural Network Platform. PVLDB, Vol. 12 (2019), 2094--2105.","journal-title":"A Comprehensive Graph Neural Network Platform. PVLDB"},{"key":"e_1_2_2_53_1","unstructured":"Difan Zou Ziniu Hu Yewen Wang Song Jiang Yizhou Sun and Quanquan Gu. 2019. Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks. In NeurIPS. 11247--11256."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626716","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3626716","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:03:33Z","timestamp":1755867813000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,8]]},"references-count":53,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,8]]}},"alternative-id":["10.1145\/3626716"],"URL":"https:\/\/doi.org\/10.1145\/3626716","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,8]]}}}