{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T10:51:52Z","timestamp":1770288712788,"version":"3.49.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T00:00:00Z","timestamp":1710201600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,3,12]]},"abstract":"<jats:p>Subgraph counting is a fundamental component for many downstream applications such as graph representation learning and query optimization.Since obtaining the exact count is often intractable,there have been a plethora of approximation methods on graph sampling techniques. Nonetheless, the state-of-the-art sampling methods still require massive samples to produce accurate approximations on large data graphs.We propose gSWORD, a GPU framework that leverages the massive parallelism of GPUs to accelerate iterative sampling algorithms for subgraph counting. Despite the embarrassingly parallel nature of the samples, there are unique challenges in accelerating subgraph counting due to its irregular computation logic. To address these challenges, we introduce two GPU-centric optimizations: (1) sample inheritance, enabling threads to inherit samples from neighboring threads to avoid idling, and (2) warp streaming, effectively distributing workloads among threads through a streaming process. Moreover, we propose a CPU-GPU co-processing pipeline that overlaps the sampling and enumeration processes to mitigate the underestimation issue. Experimental results demonstrate that deploying state-of-the-art sampling algorithms on gSWORD can perform millions of samples per second. The co-processing pipeline substantially improves the estimation accuracy in the cases where existing methods encounter severe underestimations with negligible overhead.<\/jats:p>","DOI":"10.1145\/3639288","type":"journal-article","created":{"date-parts":[[2024,3,26]],"date-time":"2024-03-26T18:51:32Z","timestamp":1711479092000},"page":"1-26","source":"Crossref","is-referenced-by-count":3,"title":["gSWORD: GPU-accelerated Sampling for Subgraph Counting"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4026-3254","authenticated-orcid":false,"given":"Chang","family":"Ye","sequence":"first","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9646-291X","authenticated-orcid":false,"given":"Yuchen","family":"Li","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4060-9438","authenticated-orcid":false,"given":"Shixuan","family":"Sun","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0706-7286","authenticated-orcid":false,"given":"Wentian","family":"Guo","sequence":"additional","affiliation":[{"name":"Uniffilated, Sunnyvale, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,3,26]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2023. The technical report for gsword. https:\/\/github.com\/Gibyeng\/gsword\/blob\/main\/report\/report.pdf."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3129246"},{"key":"e_1_2_1_3_1","first-page":"1","article-title":"GuP: Fast Subgraph Matching by Guard-based Pruning","volume":"1","author":"Arai Junya","year":"2023","unstructured":"Junya Arai, Yasuhiro Fujiwara, and Makoto Onizuka. 2023. GuP: Fast Subgraph Matching by Guard-based Pruning. PACMMOD 1, 2 (2023), 1--26.","journal-title":"PACMMOD"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Blair Archibald Fraser Dunlop Ruth Hoffmann Ciaran McCreesh Patrick Prosser and James Trimble. 2019. Sequential and parallel solution-biased search for subgraph algorithms. In CPAIOR. 20--38.","DOI":"10.1007\/978-3-030-19212-9_2"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300086"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Fei Bi Lijun Chang Xuemin Lin Lu Qin and Wenjie Zhang. 2016. Efficient subgraph matching by postponing cartesian products. In SIGMOD. 1199--1214.","DOI":"10.1145\/2882903.2915236"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281500.1281647"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3021924.3021940"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Stephen A Cook. 1971. The complexity of theorem-proving procedures. In STOC. 151--158.","DOI":"10.1145\/800157.805047"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Rayane El Sibai Yousra Chabchoub Jacques Demerjian Zakia Kazi-Aoul and Kablan Barbar. 2016. Sampling algorithms in data stream environments. In ICDEc. 29--36.","DOI":"10.1109\/ICDEC.2016.7563142"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.epidem.2018.04.003"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1372"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD. 855--864.","DOI":"10.1145\/2939672.2939754"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Wentian Guo Yuchen Li Mo Sha Bingsheng He Xiaokui Xiao and Kian-Lee Tan. 2020. Gpu-accelerated subgraph enumeration on partitioned graphs. In SIGMOD. 1067--1082.","DOI":"10.1145\/3318464.3389699"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3035564"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Myoungji Han Hyunjoon Kim Geonmo Gu Kunsoo Park and Wook-Shin Han. 2019. Efficient subgraph matching: Harmonizing dynamic programming adaptive matching order and failing set together. In SIGMOD. 1429--1446.","DOI":"10.1145\/3299869.3319880"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Za\u00efd Harchaoui and Francis Bach. 2007. Image classification with segmentation graph kernels. In CVPR. 1--8.","DOI":"10.1109\/CVPR.2007.383049"},{"key":"e_1_2_1_18_1","volume-title":"Towards efficient motif-based graph partitioning: An adaptive sampling approach","author":"Huang Shixun","unstructured":"Shixun Huang, Yuchen Li, Zhifeng Bao, and Zhao Li. 2021. Towards efficient motif-based graph partitioning: An adaptive sampling approach. In ICDE. IEEE, 528--539."},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Abhinav Jangda Sandeep Polisetty Arjun Guha and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using GPUs. In EuroSys. 311--326.","DOI":"10.1145\/3447786.3456244"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587144"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Kyoungmin Kim Hyeonji Kim George Fletcher and Wook-Shin Han. 2021. Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation. In SIGMOD. 964--976.","DOI":"10.1145\/3448016.3457246"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00708-y"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Feifei Li Bin Wu Ke Yi and Zhuoyue Zhao. 2016. Wander join: Online aggregation via random walks. In SIGMOD. 615--629.","DOI":"10.1145\/2882903.2915235"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284551"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687738"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Santosh Pandey Lingda Li Adolfy Hoisie Xiaoye S Li and Hang Liu. 2020. C-SAW: A framework for graph sampling and random walk on GPUs. In SC. 1--15.","DOI":"10.1109\/SC41405.2020.00060"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Yeonsu Park Seongyun Ko Sourav S Bhowmick Kyoungmin Kim Kijae Hong and Wook-Shin Han. 2020. G-CARE: a framework for performance benchmarking of cardinality estimation techniques for subgraph matching. In SIGMOD. 1099--1114.","DOI":"10.1145\/3318464.3389702"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623732"},{"key":"e_1_2_1_29_1","first-page":"1968","article-title":"Cpu scheduling algorithms: A survey","volume":"5","author":"Qureshi Imran","year":"2014","unstructured":"Imran Qureshi. 2014. Cpu scheduling algorithms: A survey. International Journal of Advanced Networking and Applications(IJANA) 5, 4 (2014), 1968.","journal-title":"International Journal of Advanced Networking and Applications(IJANA)"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3433652"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Purnamrita Sarkar and Andrew W Moore. 2011. Random walks in social networks and their applications: a survey. In Social Network Data Analytics. 43--77.","DOI":"10.1007\/978-1-4419-8462-3_3"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Mo Sha Yuchen Li and Kian-Lee Tan. 2021. Self-adaptive graph traversal on gpus. In SIGMOD. 1558--1570.","DOI":"10.1145\/3448016.3457279"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453899"},{"key":"e_1_2_1_34_1","unstructured":"Nino Shervashidze SVN Vishwanathan Tobias Petri Kurt Mehlhorn and Karsten Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In AISTATS. 488--495."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476257"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Shixuan Sun and Qiong Luo. 2020. In-memory subgraph matching: An in-depth study. In SIGMOD. 1083--1098.","DOI":"10.1145\/3318464.3380581"},{"key":"e_1_2_1_37_1","first-page":"1","article-title":"Efficient GPU-Accelerated Subgraph Matching","volume":"1","author":"Sun Xibo","year":"2023","unstructured":"Xibo Sun and Qiong Luo. 2023. Efficient GPU-Accelerated Subgraph Matching. PACMMOD 1, 2 (2023), 1--26.","journal-title":"PACMMOD"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/2311906.2311907"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2009.0029"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Hanchen Wang Rong Hu Ying Zhang Lu Qin Wei Wang and Wenjie Zhang. 2022. Neural Subgraph Counting with Wasserstein Estimator. In SIGMOD. 160--175.","DOI":"10.1145\/3514221.3526163"},{"key":"e_1_2_1_41_1","volume-title":"Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. In PACT. 304--317.","author":"Wang Pengyu","year":"2021","unstructured":"Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, and Minyi Guo. 2021. Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. In PACT. 304--317."},{"key":"e_1_2_1_42_1","unstructured":"Rui Wang Yongkun Li Hong Xie Yinlong Xu and John CS Lui. 2020. GraphWalker: An I\/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks. In USENIX ATC. 559--571."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2019.2952908"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAI.2021.3076021"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Ke Yang MingXing Zhang Kang Chen Xiaosong Ma Yang Bai and Yong Jiang. 2019. Knightking: a fast distributed graph random walk engine. In SOSP. 524--537.","DOI":"10.1145\/3341301.3359634"},{"key":"e_1_2_1_47_1","volume-title":"Uninet: Scalable network representation learning with metropolis-hastings sampling. In ICDE. 516--527.","author":"Yao Xingyu","year":"2021","unstructured":"Xingyu Yao, Yingxia Shao, Bin Cui, and Lei Chen. 2021. Uninet: Scalable network representation learning with metropolis-hastings sampling. In ICDE. 516--527."},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Chang Ye Yuchen Li Bingsheng He Zhao Li and Jianling Sun. 2021. Gpu-accelerated graph label propagation for real-time fraud detection. In SIGMOD. 2348--2356.","DOI":"10.1145\/3448016.3452774"},{"key":"e_1_2_1_49_1","first-page":"1","article-title":"Large-Scale Graph Label Propagation on GPUs","volume":"01","author":"Ye Chang","year":"2023","unstructured":"Chang Ye, Yuchen Li, Bingsheng He, Zhao Li, and Jianling Sun. 2023. Large-Scale Graph Label Propagation on GPUs. IEEE Transactions on Knowledge and Data Engineering(TKDE) 01 (2023), 1--14.","journal-title":"IEEE Transactions on Knowledge and Data Engineering(TKDE)"},{"key":"e_1_2_1_50_1","volume-title":"Jeffrey Xu Yu, and Yuanyuan Zhu","author":"Zhang Hao","year":"2022","unstructured":"Hao Zhang, Qiyan Li, Kangfei Zhao, Jeffrey Xu Yu, and Yuanyuan Zhu. 2022. How Learning Can Help Complex Cyclic Join Decomposition. In ICDE. IEEE, 3138--3141."},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Luming Zhang Mingli Song Zicheng Liu Xiao Liu Jiajun Bu and Chun Chen. 2013. Probabilistic graphlet cut: Exploiting spatial structure cue for weakly supervised image segmentation. In CVPR. 1908--1915.","DOI":"10.1109\/CVPR.2013.249"},{"key":"e_1_2_1_52_1","volume-title":"Hao Zhang, Qiyan Li, and Yu Rong.","author":"Zhao Kangfei","year":"2021","unstructured":"Kangfei Zhao, Jeffrey Xu Yu, Hao Zhang, Qiyan Li, and Yu Rong. 2021. A learned sketch for subgraph counting. In SIGMOD. 2142--2155."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920887"},{"key":"e_1_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Zhuoyue Zhao Robert Christensen Feifei Li Xiao Hu and Ke Yi. 2018. Random sampling over joins revisited. In SIGMOD. 1525--1539.","DOI":"10.1145\/3183713.3183739"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639288","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639288","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T15:16:51Z","timestamp":1755789411000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,12]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,12]]}},"alternative-id":["10.1145\/3639288"],"URL":"https:\/\/doi.org\/10.1145\/3639288","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,12]]}}}