{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:33:53Z","timestamp":1772724833661,"version":"3.50.1"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>\n            Sampling-based Graph Neural Networks (GNNs) have become the de facto standard for handling various graph learning tasks on large-scale graphs. As the graph size grows larger and even exceeds the standard host memory size of a single machine, out-of-core sampling-based GNN training has gained attention from the community. For out-of-core sampling-based GNN training, the performance bottleneck is the data preparation process that includes sampling neighbor lists and gathering node features from external storage. Based on this observation, existing out-of-core GNN training frameworks try to accomplish larger percentages of data requests without inquiring the external storage by designing better in-memory caches. However, the enormous overall requested data volume is unchanged under this approach. In this paper, we present a new perspective on\n            <jats:italic>reducing the overall requested data volume.<\/jats:italic>\n            Through a quantitative analysis, we find that\n            <jats:italic>Neighborhood Redundancy<\/jats:italic>\n            and\n            <jats:italic>Temporal Redundancy<\/jats:italic>\n            exist in out-of-core sampling-based GNN training. To reduce these two kinds of data redundancies, we propose OUTRE, an OUT-of-core de-REdundancy GNN training framework. OUTRE incorporates two new designs,\n            <jats:italic>partition-based batch construction<\/jats:italic>\n            and\n            <jats:italic>historical embedding cache<\/jats:italic>\n            , to reduce the corresponding data redundancies. Moreover, we propose\n            <jats:italic>automatic cache space management<\/jats:italic>\n            to automatically organize available memory for different caches. Evaluation results on four public large-scale graph datasets show that OUTRE achieves 1.52\u00d7 to 3.51\u00d7 speedup against the SOTA framework.\n          <\/jats:p>","DOI":"10.14778\/3681954.3681976","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"2960-2973","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["OUTRE: An OUT-of-Core De-REdundancy GNN Training Framework for Massive Graphs within A Single Machine"],"prefix":"10.14778","volume":"17","author":[{"given":"Zeang","family":"Sheng","sequence":"first","affiliation":[{"name":"Peking University"}]},{"given":"Wentao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Yangyu","family":"Tao","sequence":"additional","affiliation":[{"name":"Tencent Inc"}]},{"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Peking University"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-021-0420-2"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.52.0078"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_4_1","first-page":"5103","article-title":"Line graph neural networks for link prediction","volume":"44","author":"Cai Lei","year":"2021","unstructured":"Lei Cai, Jundong Li, Jie Wang, and Shuiwang Ji. 2021. Line graph neural networks for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 5103--5113.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456233"},{"key":"e_1_2_1_6_1","volume-title":"International Conference on Learning Representations.","author":"Chen Jie","year":"2018","unstructured":"Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations."},{"key":"e_1_2_1_7_1","volume-title":"Stochastic Training of Graph Convolutional Networks with Variance Reduction. In International Conference on Machine Learning. PMLR, 942--950","author":"Chen Jianfei","year":"2018","unstructured":"Jianfei Chen, Jun Zhu, and Le Song. 2018. Stochastic Training of Graph Convolutional Networks with Variance Reduction. In International Conference on Machine Learning. PMLR, 942--950."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-022-00190-8"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330925"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5 1 (1998) 46--55.","DOI":"10.1109\/99.660313"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-022-00180-w"},{"key":"e_1_2_1_12_1","volume-title":"Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428","author":"Fey Matthias","year":"2019","unstructured":"Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019)."},{"key":"e_1_2_1_13_1","volume-title":"International conference on machine learning. PMLR, 3294--3304","author":"Fey Matthias","year":"2021","unstructured":"Matthias Fey, Jan E Lenssen, Frank Weichert, and Jure Leskovec. 2021. Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In International conference on machine learning. PMLR, 3294--3304."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-023-00222-x"},{"key":"e_1_2_1_15_1","volume-title":"15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21). 551--568.","author":"Gandhi Swapnil","unstructured":"Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21). 551--568."},{"key":"e_1_2_1_16_1","unstructured":"Will Hamilton Zhitao Ying and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS. 1024--1034."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-022-3793-7"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401063"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/359138.359141"},{"key":"e_1_2_1_20_1","volume-title":"Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv preprint arXiv:2005.00687","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv preprint arXiv:2005.00687 (2020)."},{"key":"e_1_2_1_21_1","volume-title":"ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training. arXiv preprint arXiv:2301.07482","author":"Huang Kezhao","year":"2023","unstructured":"Kezhao Huang, Haitian Jiang, Minjie Wang, Guangxuan Xiao, David Wipf, Xiang Song, Quan Gan, Zengfeng Huang, Jidong Zhai, and Zheng Zhang. 2023. ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training. arXiv preprint arXiv:2301.07482 (2023)."},{"key":"e_1_2_1_22_1","unstructured":"Tianhao Huang Xuhao Chen Muhua Xu Arvind Arvind and Jie Chen. 2023. HierBatching: Locality-Aware Out-of-Core Training of Graph Neural Networks. https:\/\/openreview.net\/forum?id=WWD_2DKUqdJ"},{"key":"e_1_2_1_23_1","first-page":"187","article-title":"Improving the accuracy, scalability, and performance of graph neural networks with roc","volume":"2","author":"Jia Zhihao","year":"2020","unstructured":"Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems 2 (2020), 187--198.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-020-00479-8"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-020-3318-5"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/305219.305248"},{"key":"e_1_2_1_27_1","volume-title":"Bhagyashree Taleka, Tengfei Ma, Xiang Song, and Wen-mei Hwu.","author":"Khatua Arpandeep","year":"2023","unstructured":"Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, and Wen-mei Hwu. 2023. IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. arXiv preprint arXiv:2302.13522 (2023)."},{"key":"e_1_2_1_28_1","volume-title":"Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJU4ayYgl","author":"Thomas","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJU4ayYgl"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-023-2875-9"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-023-00215-w"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-024-4150-0"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421281"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-021-3443-y"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476264"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530503"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btaa921"},{"key":"e_1_2_1_37_1","unstructured":"Lawrence Page Sergey Brin Rajeev Motwani Terry Winograd et al. 1999. The pagerank citation ranking: Bringing order to the web. (1999)."},{"key":"e_1_2_1_38_1","volume-title":"Zaid Qureshi, and Wenmei Hwu.","author":"Park Jeongmin Brian","year":"2023","unstructured":"Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, and Wenmei Hwu. 2023. Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. arXiv preprint arXiv:2306.16384 (2023)."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551819"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575748"},{"key":"e_1_2_1_41_1","volume-title":"GPU Technology Conference, NVIDIA.","author":"Schroeder Tim C","year":"2011","unstructured":"Tim C Schroeder. 2011. Peer-to-peer & unified virtual addressing. In GPU Technology Conference, NVIDIA."},{"key":"e_1_2_1_42_1","volume-title":"Helios: An Efficient Out-of-core GNN Training System on Terabyte-scale Graphs with In-memory Performance. arXiv preprint arXiv:2310.00837","author":"Sun Jie","year":"2023","unstructured":"Jie Sun, Mo Sun, Zheng Zhang, Jun Xie, Zuocheng Shi, Zihan Yang, Jie Zhang, Fei Wu, and Zeke Wang. 2023. Helios: An Efficient Out-of-core GNN Training System on Terabyte-scale Graphs with In-memory Performance. arXiv preprint arXiv:2310.00837 (2023)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556195.2556213"},{"key":"e_1_2_1_44_1","volume-title":"Graph Attention Networks. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=rJXMpikCZ","author":"Veli\u010dkovi\u0107 Petar","year":"2018","unstructured":"Petar Veli\u010dkovi\u0107, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=rJXMpikCZ"},{"key":"e_1_2_1_45_1","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Vora Keval","year":"2019","unstructured":"Keval Vora. 2019. {LUMOS}:{Dependency-Driven} Disk-based Graph Processing. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 429--442."},{"key":"e_1_2_1_46_1","volume-title":"2016 USENIX Annual Technical Conference (USENIX ATC 16)","author":"Vora Keval","year":"2016","unstructured":"Keval Vora, Guoqing Xu, and Rajiv Gupta. 2016. Load the Edges You Need: A Generic {I\/O} Optimization for Disk-based Graph Processing. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 507--522."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3552326.3567501"},{"key":"e_1_2_1_48_1","unstructured":"Minjie Wang Da Zheng Zihao Ye Quan Gan Mufei Li Xiang Song Jinjing Zhou Chao Ma Lingfan Yu Yu Gai et al. 2019. Deep graph library: A graph-centric highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019)."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2022.3209435"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-022-00188-2"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3494523"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-023-3897-2"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-023-00226-7"},{"key":"e_1_2_1_54_1","volume-title":"International Conference on Machine Learning. PMLR, 25684--25701","author":"Yu Haiyang","year":"2022","unstructured":"Haiyang Yu, Limei Wang, Bokun Wang, Meng Liu, Tianbao Yang, and Shuiwang Ji. 2022. GraphFM: Improving large-scale GNN training via feature momentum. In International Conference on Machine Learning. PMLR, 25684--25701."},{"key":"e_1_2_1_55_1","volume-title":"GraphSAINT: Graph Sampling Based Inductive Learning Method. In International Conference on Learning Representations.","author":"Zeng Hanqing","year":"2019","unstructured":"Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019. GraphSAINT: Graph Sampling Based Inductive Learning Method. In International Conference on Learning Representations."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS54959.2023.00032"},{"key":"e_1_2_1_57_1","volume-title":"Link prediction based on graph neural networks. Advances in neural information processing systems 31","author":"Zhang Muhan","year":"2018","unstructured":"Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. Advances in neural information processing systems 31 (2018)."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539121"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-023-2835-4"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352127"},{"key":"e_1_2_1_61_1","volume-title":"2015 USENIX Annual Technical Conference (USENIX ATC 15)","author":"Zhu Xiaowei","year":"2015","unstructured":"Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. {GridGraph}:{Large-Scale} Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 375--386."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3681976","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T18:46:40Z","timestamp":1725475600000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3681976"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":61,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3681976"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3681976","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}