{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T13:08:11Z","timestamp":1775912891761,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"name":"National Key Research and Development Program of China","award":["2023YFB4502305"],"award-info":[{"award-number":["2023YFB4502305"]}]},{"name":"Beijing Natural Science Foundation","award":["4232036"],"award-info":[{"award-number":["4232036"]}]},{"name":"CAS Project for Youth Innovation Promotion Association"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>With the continuous growth of user scale and application data, the demand for large-scale concurrent graph processing is increasing. Typically, large-scale concurrent graph processing jobs need to process corresponding snapshots of dynamically changing graph data to obtain information at different time points. To enhance the throughput of such applications, current solutions concurrently process multiple graph snapshots on the GPU. However, when dealing with rapidly changing graph data, transferring multiple snapshots of concurrent jobs to the GPU results in high data transfer overhead between CPU and GPU. Additionally, the execution mode of existing work suffers from underutilization of GPU computational resources.<\/jats:p>\n          <jats:p>In this work, we introduce CGCGraph, which can be integrated into existing GPU graph processing systems like Subway, to enable efficient concurrent graph snapshot processing jobs and enhance overall system resource utilization. The key idea is to offload unshared graph data of multiple concurrent snapshots to the CPU, reducing CPU-GPU transfer overhead. By implementing CPU-GPU co-execution, there is potential for enhanced utilization of GPU computing resources. Specifically, CGCGraph leverages kernel fusion to process shared graph data concurrently on the GPU, while executing all snapshots in parallel on the CPU, with each snapshot assigned a dedicated thread. This approach enables efficient concurrent processing within a novel CPU-GPU co-execution model, incorporating three optimization strategies targeting storage, computation, and synchronization. We integrate CGCGraph with Subway, an existing system designed for out-of-GPU-memory static graph processing. Experimental results show that the integration of CGCGraph with current GPU-based systems obtains performance improvements ranging from 1.7 to 4.5 times.<\/jats:p>","DOI":"10.1145\/3744904","type":"journal-article","created":{"date-parts":[[2025,6,16]],"date-time":"2025-06-16T07:17:43Z","timestamp":1750058263000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["CGCGraph: Efficient CPU-GPU Co-execution for Concurrent Dynamic Graph Processing"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-5210-8759","authenticated-orcid":false,"given":"Yiming","family":"Sun","sequence":"first","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"Zhongguancun Laboratory","place":["Beijing, China"]},{"name":"University of the Chinese Academy of Sciences School of Computer Science and Technology","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4008-9703","authenticated-orcid":false,"given":"Jie","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1176-2521","authenticated-orcid":false,"given":"Huawei","family":"Cao","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"Zhongguancun Laboratory","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1971-1657","authenticated-orcid":false,"given":"Yuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0494-6332","authenticated-orcid":false,"given":"Xuejun","family":"An","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"Zhongguancun Laboratory","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5845-6965","authenticated-orcid":false,"given":"Junying","family":"Huang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"Shanghai Innovation Center for Processor Technologies","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4598-1685","authenticated-orcid":false,"given":"Xiaochun","family":"Ye","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,9,17]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575713"},{"key":"e_1_3_1_3_2","first-page":"227","volume-title":"Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web","author":"Boldi Paolo","year":"2014","unstructured":"Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. 2014. BUbiNG: Massive crawling for the masses. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 227\u2013228."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1480506.1480511"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","unstructured":"Deepayan Chakrabarti Yiping Zhan and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining (SDM). 442\u2013446. DOI:10.1137\/1.9781611972740.43","DOI":"10.1137\/1.9781611972740.43"},{"issue":"10","key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"722","DOI":"10.26438\/ijcse\/v6i10.722729","article-title":"Application of graph theory in social media","volume":"6","author":"Chakraborty Anwesha","year":"2018","unstructured":"Anwesha Chakraborty, Trina Dutta, Sushmita Mondal, and Asoke Nath. 2018. Application of graph theory in social media. International Journal of Computer Sciences and Engineering 6, 10 (2018), 722\u2013729.","journal-title":"International Journal of Computer Sciences and Engineering"},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis","author":"Chen Hongzheng","year":"2021","unstructured":"Hongzheng Chen, Minghua Shen, Nong Xiao, and Yutong Lu. 2021. Krill: A compiler and runtime system for concurrent graph processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. 1\u201316."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.14778\/3648160.3648179"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.14778\/3384345.3384358"},{"key":"e_1_3_1_12_2","unstructured":"Abdullah Gharaibeh Tahsin Reza Elizeu Santos-Neto Lauro Beltrao Costa Scott Sallinen and Matei Ripeanu. 2014. Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems. arXiv:1312.3018. Retrieved from https:\/\/arxiv.org\/abs\/1312.3018"},{"key":"e_1_3_1_13_2","first-page":"17","volume-title":"Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912)","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912). USENIX Association, USA, 17\u201330."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.1996.492091"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TransAI49837.2020.00029"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772751"},{"key":"e_1_3_1_17_2","first-page":"31","volume-title":"Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912)","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201912). USENIX Association, USA, 31\u201346."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081893"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113298"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303974"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/2818185"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145832"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Youshan Miao Wentao Han Kaiwei Li Ming Wu Fan Yang Lidong Zhou Vijayan Prabhakaran Enhong Chen and Wenguang Chen. 2015. Immortalgraph: A system for storage and analysis of temporal graphs. ACM Transactions on Storage (TOS) 11 3 (2015) 1\u201334.","DOI":"10.1145\/2700302"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.28"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522739"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783297"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387537"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442530"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","unstructured":"Joakim Skarding Bogdan Gabrys and Katarzyna Musial. 2021. Foundations and modeling of dynamic networks using dynamic graph neural networks: A survey. IEEE Access 9 (2021) 79143\u201379168. DOI:10.1109\/ACCESS.2021.3082932","DOI":"10.1109\/ACCESS.2021.3082932"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.14778\/2809974.2809983"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2992784"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037748"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/2337542.2337546"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295733"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3444844"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE55515.2023.00049"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600222"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2012.138"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2015.7237065"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3567955.3567963"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC-DSS-SmartCity-DependSys57074.2022.00176"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3605731.3605746"},{"issue":"6","key":"e_1_3_1_47_2","first-page":"5823","article-title":"Egraph: Efficient concurrent GPU-based dynamic graph processing","volume":"35","author":"Zhang Yu","year":"2022","unstructured":"Yu Zhang, Yuxuan Liang, Jin Zhao, Fubing Mao, Lin Gu, Xiaofei Liao, Hai Jin, Haikun Liu, Song Guo, Yangqing Zeng, et\u00a0al. 2022. Egraph: Efficient concurrent GPU-based dynamic graph processing. IEEE Transactions on Knowledge and Data Engineering 35, 6 (2022), 5823\u20135836.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477603"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3280850"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3600091"},{"key":"e_1_3_1_51_2","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis","author":"Zhao Jin","year":"2019","unstructured":"Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, and Yicheng Chen. 2019. GraphM: An efficient storage system for high throughput of concurrent graph processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. 1\u201314."},{"key":"e_1_3_1_52_2","first-page":"375","volume-title":"Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC\u201915)","author":"Zhu Xiaowei","year":"2015","unstructured":"Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. \\(\\lbrace\\) GridGraph \\(\\rbrace\\) : \\(\\lbrace\\) Large-Scale \\(\\rbrace\\) graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC\u201915). 375\u2013386."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3744904","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T13:44:40Z","timestamp":1758116680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3744904"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,17]]},"references-count":51,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3744904"],"URL":"https:\/\/doi.org\/10.1145\/3744904","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,17]]},"assertion":[{"value":"2024-07-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-27","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}