{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T21:53:52Z","timestamp":1768082032591,"version":"3.49.0"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,9,29]],"date-time":"2021-09-29T00:00:00Z","timestamp":1632873600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61832006, 61825202, 62072193, and 61929103"],"award-info":[{"award-number":["61832006, 61825202, 62072193, and 61929103"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Zhejiang Lab","award":["2021KD0AB01"],"award-info":[{"award-number":["2021KD0AB01"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["2020kfyXJJS018"],"award-info":[{"award-number":["2020kfyXJJS018"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>\n            Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices\u2019 new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system\n            <jats:italic>LargeGraph<\/jats:italic>\n            . Different from existing out-of-GPU-memory systems, LargeGraph proposes a\n            <jats:italic>dependency-aware data-driven execution approach<\/jats:italic>\n            , which can significantly accelerate active vertices\u2019 state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states\u2019 propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19\u201311.62\u00d7), Graphie (3.02\u20139.41\u00d7), Garaph (2.75\u20138.36\u00d7), and Subway (2.45\u20134.15\u00d7).\n          <\/jats:p>","DOI":"10.1145\/3477603","type":"journal-article","created":{"date-parts":[[2021,9,29]],"date-time":"2021-09-29T10:22:55Z","timestamp":1632910975000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["LargeGraph"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0718-8045","authenticated-orcid":false,"given":"Yu","family":"Zhang","sequence":"first","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China"}]},{"given":"Da","family":"Peng","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China"}]},{"given":"Xiaofei","family":"Liao","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China"}]},{"given":"Hai","family":"Jin","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China"}]},{"given":"Haikun","family":"Liu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China"}]},{"given":"Lin","family":"Gu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China"}]},{"given":"Bingsheng","family":"He","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2021,9,29]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Retrieved","year":"2020","unstructured":"Lemur. 2020 . ClueWeb12 Web Graph . Retrieved August 17, 2021 fromhttp:\/\/www.lemurproject.org\/clueweb12\/webgraph.php\/. Lemur. 2020. ClueWeb12 Web Graph. Retrieved August 17, 2021 fromhttp:\/\/www.lemurproject.org\/clueweb12\/webgraph.php\/."},{"key":"e_1_2_1_2_1","volume-title":"Retrieved","author":"Commons Web Data","year":"2020","unstructured":"Web Data Commons . 2020 . Hyperlink Graphs . Retrieved August 17, 2021 fromhttp:\/\/webdatacommons.org\/hyperlinkgraph\/. Web Data Commons. 2020. Hyperlink Graphs. Retrieved August 17, 2021 fromhttp:\/\/webdatacommons.org\/hyperlinkgraph\/."},{"key":"e_1_2_1_3_1","volume-title":"Retrieved","author":"Web Algorithmics Laboratory","year":"2020","unstructured":"Laboratory for Web Algorithmics . 2020 . Datasets . Retrieved August 17, 2021 fromhttp:\/\/law.di.unimi.it\/datasets.php. Laboratory for Web Algorithmics. 2020. Datasets. Retrieved August 17, 2021 fromhttp:\/\/law.di.unimi.it\/datasets.php."},{"key":"e_1_2_1_4_1","unstructured":"Stanford. 2020. Stanford Large Network Dataset Collection.http:\/\/snap.stanford.edu\/data\/index.html.  Stanford. 2020. Stanford Large Network Dataset Collection.http:\/\/snap.stanford.edu\/data\/index.html."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018756"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-016-5551-7"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377922"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498258"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192404"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques. 15\u201328","author":"Dathathri R.","unstructured":"R. Dathathri , G. Gill , L. Hoang , V. Jatala , K. Pingali , V. K. Nandivada , H. Dang , and M. Snir . 2019. Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics . In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques. 15\u201328 . R. Dathathri, G. Gill, L. Hoang, V. Jatala, K. Pingali, V. K. Nandivada, H. Dang, and M. Snir. 2019. Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics. In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques. 15\u201328."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00054"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3384345.3384358"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370866"},{"key":"e_1_2_1_14_1","volume-title":"Scott Sallinen, and Matei Ripeanu.","author":"Gharaibeh Abdullah","year":"2014","unstructured":"Abdullah Gharaibeh , Tahsin Reza , Elizeu Santosneto , Lauro Beltrao Costa , Scott Sallinen, and Matei Ripeanu. 2014 . Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv:1312.3018. Abdullah Gharaibeh, Tahsin Reza, Elizeu Santosneto, Lauro Beltrao Costa, Scott Sallinen, and Matei Ripeanu. 2014. Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv:1312.3018."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17\u201330","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez , Yucheng Low , Haijie Gu , Danny Bickson , and Carlos Guestrin . 2012 . PowerGraph: Distributed graph-parallel computation on natural graphs . In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17\u201330 . Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17\u201330."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.41"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27\u201340","author":"Hong Changwan","unstructured":"Changwan Hong , Aravind Sukumaranrajam , Jinsung Kim , and P. Sadayappan . 2017. MultiGraph: Efficient graph processing on GPUs . In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27\u201340 . Changwan Hong, Aravind Sukumaranrajam, Jinsung Kim, and P. Sadayappan. 2017. MultiGraph: Efficient graph processing on GPUs. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27\u201340."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173180"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3157794.3157799"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239\u2013252","author":"Khorasani Farzad","unstructured":"Farzad Khorasani , Keval Vora , Rajiv Gupta , and Laxmi N. Bhuyan . 2014. CuSha: Vertex-centric graph processing on GPUs . In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239\u2013252 . Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239\u2013252."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378529"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915204"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31\u201346","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola , Guy E. Blelloch , and Carlos Guestrin . 2012 . GraphChi: Large-scale graph computation on just a PC . In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31\u201346 . Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31\u201346."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1360\/N112018-00125"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3098976"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference. 411\u2013428","author":"Liu Hang","unstructured":"Hang Liu and H. Howie Huang . 2019. SIMD-X: Programming and processing of graph algorithms on GPUs . In Proceedings of the 2019 USENIX Annual Technical Conference. 411\u2013428 . Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and processing of graph algorithms on GPUs. In Proceedings of the 2019 USENIX Annual Technical Conference. 411\u2013428."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2931058"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-018-7443-z"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 2017 USENIX Annual Technical Conference. 195\u2013207","author":"Ma Lingxiao","year":"2017","unstructured":"Lingxiao Ma , Zhi Yang , Han Chen , Jilong Xue , and Yafei Dai . 2017 . Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication . In Proceedings of the 2017 USENIX Annual Technical Conference. 195\u2013207 . Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the 2017 USENIX Annual Technical Conference. 195\u2013207."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479\u2013490","author":"Pan Yuechao","unstructured":"Yuechao Pan , Yangzihao Wang , Yuduo Wu , Carl Yang , and John D. Owens . 2017. Multi-GPU graph analytics . In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479\u2013490 . Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU graph analytics. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479\u2013490."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 15th European Conference on Computer Systems. Article 12","author":"Nodehi Sabet Amir Hossein","year":"2020","unstructured":"Amir Hossein Nodehi Sabet , Zhijia Zhao , and Rajiv Gupta . 2020 . Subway: Minimizing data transfer during out-of-GPU-memory graph processing . In Proceedings of the 15th European Conference on Computer Systems. Article 12 , 16 pages. Amir Hossein Nodehi Sabet, Zhijia Zhao, and Rajiv Gupta. 2020. Subway: Minimizing data transfer during out-of-GPU-memory graph processing. In Proceedings of the 15th European Conference on Computer Systems. Article 12, 16 pages."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807655"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2015.52"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2020.3019641"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/2311906.2311907"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18120-2_18"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Article 11","author":"Wang Yangzihao","unstructured":"Yangzihao Wang , Andrew Davidson , Yuechao Pan , Yuduo Wu , Andy Riffel , and John D. Owens . 2016. Gunrock: A high-performance graph processing library on the GPU . In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Article 11 , 12 pages. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Article 11, 12 pages."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.235"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/MNET.2017.1500138NM"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3416495"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the 2018 USENIX Annual Technical Conference. 441\u2013452","author":"Zhang Yu","year":"2018","unstructured":"Yu Zhang , Xiaofei Liao , Hai Jin , Lin Gu , Ligang He , Bingsheng He , and Haikun Liu . 2018 . CGraph: A correlations-aware approach for efficient concurrent iterative graph processing . In Proceedings of the 2018 USENIX Annual Technical Conference. 441\u2013452 . Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Ligang He, Bingsheng He, and Haikun Liu. 2018. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proceedings of the 2018 USENIX Annual Technical Conference. 441\u2013452."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2624289"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2781241"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304029"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00039"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-014-3472-4"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCC.2014.2328594"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCC.2015.2415810"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.2333511"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-014-0748-9"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2776115"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3319406"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356143"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.111"},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the 24th ACM Symposium on Operating Systems Principles. 472\u2013488","author":"Zwaenepoel Willy","year":"2013","unstructured":"Willy Zwaenepoel , Willy Zwaenepoel , and Willy Zwaenepoel . 2013 . X-stream: Edge-centric graph processing using streaming partitions . In Proceedings of the 24th ACM Symposium on Operating Systems Principles. 472\u2013488 . Willy Zwaenepoel, Willy Zwaenepoel, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. 472\u2013488."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477603","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477603","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:37Z","timestamp":1750183837000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477603"}},"subtitle":["An Efficient Dependency-Aware GPU-Accelerated Large-Scale Graph Processing"],"short-title":[],"issued":{"date-parts":[[2021,9,29]]},"references-count":55,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3477603"],"URL":"https:\/\/doi.org\/10.1145\/3477603","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,29]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}