{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,24]],"date-time":"2025-08-24T00:02:17Z","timestamp":1755993737408,"version":"3.44.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"CoNEXT4","license":[{"start":{"date-parts":[[2024,11,25]],"date-time":"2024-11-25T00:00:00Z","timestamp":1732492800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Microsoft Research Fellowship","award":["8300751"],"award-info":[{"award-number":["8300751"]}]},{"DOI":"10.13039\/501100006374","name":"NSF","doi-asserted-by":"publisher","award":["2319988, 2206522"],"award-info":[{"award-number":["2319988, 2206522"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Netw."],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:p>Despite recent breakthroughs in distributed Graph Neural Network (GNN) training, large-scale graphs still generate significant network communication overhead, decreasing time and resource efficiency. Although recently proposed partitioning or caching methods try to reduce communication inefficiencies and overheads, they are not sufficiently effective due to their sampling pattern-agnostic nature. This paper proposes a Pipelined Partition Aware Caching and Communication Efficient Refinement System (Pacer), a communication-efficient distributed GNN training system. First, Pacer intelligently estimates each partition's access frequency to each vertex by jointly considering the sampling method and graph topology. Then, it uses the estimated access frequency to refine partitions and caching vertices in its two-level cache (CPU and GPU) to minimize data transfer latency. Furthermore, Pacer incorporates a pipeline-based minibatching method to mask the effect of the network communication. Experimental results on real-world graphs show that Pacer outperforms state-of-the-art distributed GNN training system in training time by 40% on average.<\/jats:p>","DOI":"10.1145\/3697805","type":"journal-article","created":{"date-parts":[[2024,11,25]],"date-time":"2024-11-25T11:15:47Z","timestamp":1732533347000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["PACER: Accelerating Distributed GNN Training Using Communication-Efficient Partition Refinement and Caching"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-5989-078X","authenticated-orcid":false,"given":"Shohaib","family":"Mahmud","sequence":"first","affiliation":[{"name":"University of Virginia, Charottesville, VA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7548-6223","authenticated-orcid":false,"given":"Haiying","family":"Shen","sequence":"additional","affiliation":[{"name":"University of Virginia, Charlottesville, VA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5952-3346","authenticated-orcid":false,"given":"Anand","family":"Iyer","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,11,25]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"12th USENIX symposium on operating systems design and implementation (OSDI 16)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265--283."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_2_1_4_1","volume-title":"Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015)."},{"key":"e_1_2_1_5_1","volume-title":"Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428","author":"Fey Matthias","year":"2019","unstructured":"Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019)."},{"key":"e_1_2_1_6_1","volume-title":"Protein interface prediction using graph convolutional networks. Advances in neural information processing systems","author":"Fout Alex","year":"2017","unstructured":"Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. 2017. Protein interface prediction using graph convolutional networks. Advances in neural information processing systems, Vol. 30 (2017)."},{"key":"e_1_2_1_7_1","unstructured":"Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed Deep Graph Learning at Scale.. In OSDI. 551--568."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2020.2986316"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035","author":"Hamilton William L","year":"2017","unstructured":"William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035."},{"key":"e_1_2_1_10_1","unstructured":"https:\/\/github.com\/dmlc\/dgl. 2022. Deep Graph Library (DGL)."},{"key":"e_1_2_1_11_1","volume-title":"Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687 (2020)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456244"},{"key":"e_1_2_1_13_1","first-page":"187","article-title":"Improving the accuracy, scalability, and performance of graph neural networks with roc","volume":"2","author":"Jia Zhihao","year":"2020","unstructured":"Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems , Vol. 2 (2020), 187--198.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_2_1_14_1","volume-title":"METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices.","author":"Karypis George","year":"1997","unstructured":"George Karypis and Vipin Kumar. 1997. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. (1997)."},{"key":"e_1_2_1_15_1","volume-title":"Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907","author":"Kipf Thomas N","year":"2016","unstructured":"Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings 14","author":"Kirsch Adam","year":"2006","unstructured":"Adam Kirsch and Michael Mitzenmacher. 2006. Less hashing, same performance: Building a better bloom filter. In Algorithms--ESA 2006: 14th Annual European Symposium, Zurich, Switzerland, September 11--13, 2006. Proceedings 14. Springer, 456--467."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487788.2488173"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421281"},{"key":"e_1_2_1_19_1","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)","author":"Liu Tianfeng","year":"2023","unstructured":"Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. 2023. BGL:GPU-Efficient GNN Training by Optimizing Graph Data I\/O and Preprocessing. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 103--118."},{"key":"e_1_2_1_20_1","volume-title":"Machine learning in chemoinformatics and drug discovery. Drug discovery today","author":"Lo Yu-Chen","year":"2018","unstructured":"Yu-Chen Lo, Stefano E Rensi, Wen Torng, and Russ B Altman. 2018. Machine learning in chemoinformatics and drug discovery. Drug discovery today, Vol. 23, 8 (2018), 1538--1546."},{"key":"e_1_2_1_21_1","volume-title":"2019 USENIX Annual Technical Conference USENIX ATC 19)","author":"Ma Lingxiao","year":"2019","unstructured":"Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In 2019 USENIX Annual Technical Conference USENIX ATC 19). 443--458."},{"key":"e_1_2_1_22_1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI) 21","author":"Mohoney Jason","year":"2021","unstructured":"Jason Mohoney, Roger Waleffe, Henry Xu, Theodoros Rekatsinas, and Shivaram Venkataraman. 2021. Marius: Learning massive graph embeddings on a single machine. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI) 21."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403280"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00060"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5984"},{"key":"e_1_2_1_26_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019), 8026--8037."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/52324.52339"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cell.2020.01.021"},{"key":"e_1_2_1_29_1","volume-title":"Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training. In 2023 USENIX Annual Technical Conference (USENIX ATC 23)","author":"Sun Jie","year":"2023","unstructured":"Jie Sun, Li Su, Zuocheng Shi, Wenting Shen, Zeke Wang, Lei Wang, Jie Zhang, Yong Li, Wenyuan Yu, Jingren Zhou, and Fei Wu. 2023. Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 165--179. https:\/\/www.usenix.org\/conference\/atc23\/presentation\/sun"},{"key":"e_1_2_1_30_1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation OSDI 21)","author":"Thorpe John","year":"2021","unstructured":"John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, et al. 2021. Dorylus: affordable, scalable, and accurate GNN training with distributed CPU servers and serverless threads. In 15th USENIX Symposium on Operating Systems Design and Implementation OSDI 21). 495--514."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433794"},{"key":"e_1_2_1_32_1","volume-title":"Graph Attention Networks. International Conference on Learning Representations","author":"Velivckovi\u00e7 Petar","year":"2018","unstructured":"Petar Velivckovi\u00e7, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, and Yoshua Bengio. 2018. Graph Attention Networks. International Conference on Learning Representations (2018). https:\/\/openreview.net\/forum?id=rJXMpikCZ"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456229"},{"key":"e_1_2_1_34_1","unstructured":"Minjie Wang Lingfan Yu Da Zheng Quan Gan Yu Gai Zihao Ye Mufei Li Jinjing Zhou Qi Huang Chao Ma et al. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. (2019)."},{"key":"e_1_2_1_35_1","volume-title":"15th USENIX symposium on operating systems design and implementation (OSDI 21)","author":"Wang Yuke","year":"2021","unstructured":"Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. GNNAdvisor: An adaptive and efficient runtime system for GNN acceleration on GPUs. In 15th USENIX symposium on operating systems design and implementation (OSDI 21)."},{"key":"e_1_2_1_36_1","unstructured":"Yuke Wang Boyuan Feng Zheng Wang Tong Geng Kevin Barker Ang Li and Yufei Ding. 2023. MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). 779--795."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449884"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456247"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492321.3519557"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_2_1_41_1","volume-title":"International Conference on Machine Learning. PMLR, 25684--25701","author":"Yu Haiyang","year":"2022","unstructured":"Haiyang Yu, Limei Wang, Bokun Wang, Meng Liu, Tianbao Yang, and Shuiwang Ji. 2022. GraphFM: Improving large-scale GNN training via feature momentum. In International Conference on Machine Learning. PMLR, 25684--25701."},{"key":"e_1_2_1_42_1","unstructured":"Dalong Zhang Xin Huang Ziqi Liu Zhiyang Hu Xianzheng Song Zhibang Ge Zhiqiang Zhang Lin Wang Jun Zhou Yang Shuang et al. 2020. Agl: a scalable system for industrial-purpose graph machine learning. arXiv preprint arXiv:2003.02454 (2020)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539415"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2019.2935152"},{"key":"e_1_2_1_45_1","volume-title":"Aligraph: a comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730","author":"Zhu Rong","year":"2019","unstructured":"Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. Aligraph: a comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730 (2019)."}],"container-title":["Proceedings of the ACM on Networking"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3697805","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3697805","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T01:25:19Z","timestamp":1755912319000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3697805"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,25]]},"references-count":45,"journal-issue":{"issue":"CoNEXT4","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["10.1145\/3697805"],"URL":"https:\/\/doi.org\/10.1145\/3697805","relation":{},"ISSN":["2834-5509"],"issn-type":[{"type":"electronic","value":"2834-5509"}],"subject":[],"published":{"date-parts":[[2024,11,25]]},"assertion":[{"value":"2024-11-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}