{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T10:25:18Z","timestamp":1771064718537,"version":"3.50.1"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,6,13]]},"abstract":"<jats:p>Recently Graph Neural Networks (GNNs) have achieved great success in many applications. The mini-batch training has become the de-facto way to train GNNs on giant graphs. However, the mini-batch generation task is extremely expensive which slows down the whole training process. Researchers have proposed several solutions to accelerate the mini-batch generation, however, they (1) fail to exploit the locality of the adjacency matrix, (2) cannot fully utilize the GPU memory, and (3) suffer from the poor adaptability to diverse workloads. In this work, we propose DUCATI, aDual-Cache system to overcome these drawbacks. In addition to the traditionalNfeat-Cache, DUCATI introduces a newAdj-Cache to further accelerate the mini-batch generation and better utilize GPU memory. DUCATI develops a workload-awareDual-Cache Allocator which adaptively finds the best cache allocation plan under different settings. We compare DUCATI with various GNN training systems on four billion-scale graphs under diverse workload settings. The experimental results show that in terms of training time, DUCATI can achieve up to 3.33 times speedup (2.07 times on average) compared to DGL and up to 1.54 times speedup (1.32 times on average) compared to the state-of-the-artSingle-Cache systems. We also analyze the time-accuracy trade-offs of DUCATI and four state-of-the-art GNN training systems. The analysis results offer users some guidelines on system selection regarding different input sizes and hardware resources.<\/jats:p>","DOI":"10.1145\/3589311","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T20:26:45Z","timestamp":1687292805000},"page":"1-24","source":"Crossref","is-referenced-by-count":33,"title":["DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8560-5006","authenticated-orcid":false,"given":"Xin","family":"Zhang","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8364-3674","authenticated-orcid":false,"given":"Yanyan","family":"Shen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8559-2628","authenticated-orcid":false,"given":"Yingxia","family":"Shao","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8257-5806","authenticated-orcid":false,"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Graph convolutional matrix completion. KDD","author":"van den Berg Rianne","year":"2018","unstructured":"Rianne van den Berg, Thomas N Kipf, and Max Welling. 2018. Graph convolutional matrix completion. KDD (2018)."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_2_2_3_1","volume-title":"Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 233--244","author":"Bulucc Aydin","year":"2009","unstructured":"Aydin Bulucc, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 233--244."},{"key":"e_1_2_2_4_1","unstructured":"Jianfei Chen Jun Zhu and Le Song. 2018. Stochastic training of graph convolutional networks with variance reduction. In ICML."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824077"},{"key":"e_1_2_2_6_1","volume-title":"Global neighbor sampling for mixed CPU-GPU training on giant graphs. KDD","author":"Dong Jialin","year":"2021","unstructured":"Jialin Dong, Da Zheng, Lin F Yang, and Geroge Karypis. 2021. Global neighbor sampling for mixed CPU-GPU training on giant graphs. KDD (2021)."},{"key":"e_1_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Wenqi Fan Yao Ma Qing Li Yuan He Eric Zhao Jiliang Tang and Dawei Yin. 2019. Graph neural networks for social recommendation. In WWW.","DOI":"10.1145\/3308558.3313488"},{"key":"e_1_2_2_8_1","volume-title":"ICLR Workshop","author":"Fey Matthias","year":"2019","unstructured":"Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. ICLR Workshop (2019)."},{"key":"e_1_2_2_9_1","volume-title":"15th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 21). 551--568.","author":"Gandhi Swapnil","unstructured":"Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In 15th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 21). 551--568."},{"key":"e_1_2_2_10_1","unstructured":"William L Hamilton Rex Ying and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS."},{"key":"e_1_2_2_11_1","volume-title":"Hu","author":"Weihua","year":"2020","unstructured":"Weihua et al. Hu. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, Vol. 33 (2020), 22118--22133."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456244"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772751"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2928289"},{"key":"e_1_2_2_15_1","volume-title":"Black-box Adversarial Attack and Defense on Graph Neural Networks. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1017--1030","author":"Li Haoyang","year":"2022","unstructured":"Haoyang Li, Shimin Di, Zijian Li, Lei Chen, and Jiannong Cao. 2022a. Black-box Adversarial Attack and Defense on Graph Neural Networks. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1017--1030."},{"key":"e_1_2_2_16_1","volume-title":"CC-GNN: A Community and Contraction-based Graph Neural Network. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 231--240","author":"Li Zhiyuan","year":"2022","unstructured":"Zhiyuan Li, Xun Jian, Yue Wang, and Lei Chen. 2022b. CC-GNN: A Community and Contraction-based Graph Neural Network. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 231--240."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421281"},{"key":"e_1_2_2_18_1","volume-title":"Quiver A Distributed Graph Learning Library for PyTorch Geometric. https:\/\/torch-quiver.readthedocs.io\/en\/latest\/","author":"Mai Luo","year":"2021","unstructured":"Luo Mai. 2021. Quiver A Distributed Graph Learning Library for PyTorch Geometric. https:\/\/torch-quiver.readthedocs.io\/en\/latest\/ (2021)."},{"key":"e_1_2_2_19_1","volume-title":"Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu.","author":"Min Seung Won","year":"2021","unstructured":"Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoug lu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu. 2021. Large graph convolutional network training with gpu-oriented data communication architecture. arXiv preprint arXiv:2103.03330 (2021)."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538614"},{"key":"e_1_2_2_21_1","volume-title":"Peer-to-Peer Unified Virtual Addressing. https:\/\/developer. download.nvidia.com\/CUDA\/training\/cuda_webinars_GPUDirect_uva.pdf","author":"Schroeder Tim","year":"2011","unstructured":"Tim Schroeder. 2011. Peer-to-Peer Unified Virtual Addressing. https:\/\/developer. download.nvidia.com\/CUDA\/training\/cuda_webinars_GPUDirect_uva.pdf (2011)."},{"key":"e_1_2_2_22_1","unstructured":"Petar Velivc kovi\u0107 Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Li\u00f2 and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR."},{"key":"e_1_2_2_23_1","volume-title":"MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks. In Eighteenth European Conference on Computer Systems (EuroSys' 23)","author":"Waleffe Roger","year":"2023","unstructured":"Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, and Shivaram Venkataraman. 2023. MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks. In Eighteenth European Conference on Computer Systems (EuroSys' 23)."},{"key":"e_1_2_2_24_1","unstructured":"Minjie Wang Lingfan Yu Da Zheng Quan Gan Yu Gai Zihao Ye Mufei Li Jinjing Zhou Qi Huang Chao Ma et al. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. arXiv:1909.01315 (2019)."},{"key":"e_1_2_2_25_1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Wang Yuke","year":"2021","unstructured":"Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. $$GNNAdvisor$$: An Adaptive and Efficient Runtime System for $$GNN$$ Acceleration on $$GPUs$$. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 515--531."},{"key":"e_1_2_2_26_1","unstructured":"Max Welling and Thomas N Kipf. 2017. Semi-supervised classification with graph convolutional networks. In ICLR."},{"key":"e_1_2_2_27_1","volume-title":"Graph neural networks in recommender systems: a survey. Comput. Surveys","author":"Wu Shiwen","year":"2020","unstructured":"Shiwen Wu, Fei Sun, Wentao Zhang, Xu Xie, and Bin Cui. 2020. Graph neural networks in recommender systems: a survey. Comput. Surveys (2020)."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492321.3519557"},{"key":"e_1_2_2_29_1","volume-title":"Predicting multicellular function through multi-layer tissue networks. Bioinformatics","author":"Zitnik Marinka","year":"2017","unstructured":"Marinka Zitnik and Jure Leskovec. 2017. Predicting multicellular function through multi-layer tissue networks. Bioinformatics (2017)."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589311","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589311","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:13Z","timestamp":1750178773000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589311"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,13]]}},"alternative-id":["10.1145\/3589311"],"URL":"https:\/\/doi.org\/10.1145\/3589311","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,13]]}}}