{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T22:42:22Z","timestamp":1774219342846,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:p>Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2\u00d7, 72.1\u00d7, and 16.4\u00d7 speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.<\/jats:p>","DOI":"10.14778\/3659437.3659438","type":"journal-article","created":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T16:22:27Z","timestamp":1717172547000},"page":"1788-1801","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework"],"prefix":"10.14778","volume":"17","author":[{"given":"Junyi","family":"Mei","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Shixuan","family":"Sun","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Chao","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Cheng","family":"Xu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Cheng","family":"Chen","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]},{"given":"Yibo","family":"Liu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Jing","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Cheng","family":"Zhao","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]},{"given":"Xiaofeng","family":"Hou","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Minyi","family":"Guo","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Bingsheng","family":"He","sequence":"additional","affiliation":[{"name":"National University of Singapore"}]},{"given":"Xiaoliang","family":"Cong","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]}],"member":"320","published-online":{"date-parts":[[2024,5,31]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/LA-WEB.2012.19"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/69.3.653"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/242224.242490"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12083-016-0457-0"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098036"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1080\/15427951.2005.10129104"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939754"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3358061"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/7902.7903"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3549934"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456244"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3184558.3186929"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1973.5009159"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507173"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507173"},{"key":"e_1_2_1_18_1","volume-title":"10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8--10, 2012. 31--46. https:\/\/www.usenix.org\/conference\/osdi12\/technicalsessions\/presentation\/kyrola"},{"key":"e_1_2_1_19_1","volume-title":"Relational retrieval using a combination of path-constrained random walks. Machine learning 81","author":"Lao Ni","year":"2010","unstructured":"Ni Lao and William W Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning 81 (2010), 53--67."},{"key":"e_1_2_1_20_1","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICC.2019.8761899"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1334"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2012.6266966"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Junyi Mei Shixuan Sun Chao Li Cheng Xu Cheng Chen Yibo Liu Jing Wang Cheng Zhao Xiaofeng Hou Minyi Guo Bingsheng He and Xiaoliang Cong. 2024. FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework. arXiv:2404.08364 [cs.DC]","DOI":"10.14778\/3659437.3659438"},{"key":"e_1_2_1_25_1","unstructured":"NVIDIA. 2022. CUB Documentation. https:\/\/nvlabs.github.io\/cub\/index.html Last accessed on 2023-6-25."},{"key":"e_1_2_1_26_1","unstructured":"NVIDIA. 2023. CUDA Toolkit Documentation cuRAND. https:\/\/docs.nvidia.com\/cuda\/curand\/index.html Last accessed on 2023-6-25."},{"key":"e_1_2_1_27_1","unstructured":"S. Olver and A. Townsend. 2013. Fast inverse transform sampling in one and two dimensions. arXiv: Numerical Analysis (2013)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00060"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565816.3565835"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623732"},{"key":"e_1_2_1_31_1","volume-title":"Monte Carlo statistical methods","author":"Robert Christian","unstructured":"Christian Robert and George Casella. 2013. Monte Carlo statistical methods. In Springer Science & Business Media."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005605"},{"key":"e_1_2_1_33_1","unstructured":"Shubhabrata Sengupta Aaron Lefohn and John D Owens. 2006. A work-efficient step-efficient prefix sum algorithm. (2006)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380562"},{"key":"e_1_2_1_35_1","volume-title":"Discriminative predicate path mining for fact checking in knowledge graphs. Knowledge-based systems 104","author":"Shi Baoxu","year":"2016","unstructured":"Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowledge-based systems 104 (2016), 123--133."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476257"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2481244.2481248"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588944"},{"key":"e_1_2_1_39_1","volume-title":"Distributed Matrix-Based Sampling for Graph Neural Network Training. arXiv preprint arXiv:2311.02909","author":"Tripathy Alok","year":"2023","unstructured":"Alok Tripathy, Katherine Yelick, and Aydin Buluc. 2023. Distributed Matrix-Based Sampling for Graph Neural Network Training. arXiv preprint arXiv:2311.02909 (2023)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3147.3165"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/355744.355749"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","author":"Wang Jizhe","year":"2018","unstructured":"Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, B. Zhao, and D. Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018)."},{"key":"e_1_2_1_43_1","unstructured":"Minjie Wang Lingfan Yu Da Zheng Quan Gan Yu Gai Zihao Ye Mufei Li Jinjing Zhou Qi Huang Chao Ma et al. 2019. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT52795.2021.00029"},{"key":"e_1_2_1_45_1","volume-title":"Optimizing GPU-based Graph Sampling and Random Walk for Efficiency and Scalability","author":"Wang Pengyu","year":"2023","unstructured":"Pengyu Wang, Cheng Xu, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Xiaofeng Hou, and Minyi Guo. 2023. Optimizing GPU-based Graph Sampling and Random Walk for Efficiency and Scalability. IEEE Trans. Comput. (2023)."},{"key":"e_1_2_1_46_1","volume-title":"2020 USENIX Annual Technical Conference (USENIX ATC 20)","author":"Wang Rui","year":"2020","unstructured":"Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, and John CS Lui. 2020. {GraphWalker}: An {I\/O-Efficient} and {Resource-Friendly} Graph Analytic System for Fast and Scalable Random Walks. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 559--571."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"3","author":"Wang Shuke","year":"2023","unstructured":"Shuke Wang, Mingxing Zhang, Ke Yang, Kang Chen, Shaonan Ma, Jinlei Jiang, and Yongwei Wu. 2023. NosWalker: A Decoupled Architecture for Out-of-Core Random Walk Processing. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 466--482."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/3015270.3015272"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477132.3483575"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359634"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557443"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415539"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352127"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3659437.3659438","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T21:35:35Z","timestamp":1732138535000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3659437.3659438"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4]]},"references-count":53,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["10.14778\/3659437.3659438"],"URL":"https:\/\/doi.org\/10.14778\/3659437.3659438","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,4]]},"assertion":[{"value":"2024-05-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}