{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T09:56:49Z","timestamp":1772791009274,"version":"3.50.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T00:00:00Z","timestamp":1601424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61832006, 61825202, 61702202, and 61929103"],"award-info":[{"award-number":["61832006, 61825202, 61702202, and 61929103"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Program of China","award":["2018YFB1003500"],"award-info":[{"award-number":["2018YFB1003500"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>\n            Recently, iterative graph algorithms are proposed to be handled by GPU-accelerated systems. However, in iterative graph processing, the parallelism of GPU is still underutilized by existing GPU-based solutions. In fact, because of the power-law property of the natural graphs, the paths between a small set of important vertices (e.g., high-degree vertices) play a more important role in iterative graph processing\u2019s convergence speed. Based on this fact, for faster iterative graph processing on GPUs, this article develops a novel system, called\n            <jats:italic>AsynGraph<\/jats:italic>\n            , to maximize its data parallelism. It first proposes an efficient\n            <jats:italic>structure-aware asynchronous processing way<\/jats:italic>\n            . It enables the state propagations of most vertices to be effectively conducted on the GPUs in a concurrent way to get a higher GPU utilization ratio through efficiently handling the paths between the important vertices. Specifically, a graph sketch (consisting of the paths between the important vertices) is extracted from the original graph to serve as a fast bridge for most state propagations. Through efficiently processing this sketch more times within each round of graph processing, higher parallelism of GPU can be utilized to accelerate most state propagations. In addition, a\n            <jats:italic>forward-backward intra-path processing way<\/jats:italic>\n            is also adopted to asynchronously handle the vertices on each path, aiming to further boost propagations along paths and also ensure smaller data access cost. In comparison with existing GPU-based systems, i.e., Gunrock, Groute, Tigr, and DiGraph, AsynGraph can speed up iterative graph processing by 3.06\u201311.52, 2.47\u20135.40, 2.23\u20139.65, and 1.41\u20134.05 times, respectively.\n          <\/jats:p>","DOI":"10.1145\/3416495","type":"journal-article","created":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T11:23:50Z","timestamp":1601465030000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["AsynGraph"],"prefix":"10.1145","volume":"17","author":[{"given":"Yu","family":"Zhang","sequence":"first","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China"}]},{"given":"Xiaofei","family":"Liao","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China"}]},{"given":"Lin","family":"Gu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China"}]},{"given":"Hai","family":"Jin","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China"}]},{"given":"Kan","family":"Hu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China"}]},{"given":"Haikun","family":"Liu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China"}]},{"given":"Bingsheng","family":"He","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2020,9,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2019. Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data\/index.html.  2019. Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data\/index.html."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 2017 USENIX Annual Technical Conference. 125--137","author":"Ai Zhiyuan","year":"2017","unstructured":"Zhiyuan Ai , Mingxing Zhang , Yongwei Wu , Xuehai Qian , Kang Chen , and Weimin Zheng . 2017 . Squeezing out all the value of loaded data: An out-of-core graph processing system with reduced disk I\/O . In Proceedings of the 2017 USENIX Annual Technical Conference. 125--137 . Zhiyuan Ai, Mingxing Zhang, Yongwei Wu, Xuehai Qian, Kang Chen, and Weimin Zheng. 2017. Squeezing out all the value of loaded data: An out-of-core graph processing system with reduced disk I\/O. In Proceedings of the 2017 USENIX Annual Technical Conference. 125--137."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1572769.1572792"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018756"},{"key":"e_1_2_1_5_1","volume-title":"Computing top-k closeness centrality faster in unweighted graphs. ACM Transactions on Knowledge Discovery from Data 13, 5","author":"Bergamini Elisabetta","year":"2019","unstructured":"Elisabetta Bergamini , Michele Borassi , Pierluigi Crescenzi , Andrea Marino , and Henning Meyerhenke . 2019. Computing top-k closeness centrality faster in unweighted graphs. ACM Transactions on Knowledge Discovery from Data 13, 5 ( 2019 ), 1--40. Elisabetta Bergamini, Michele Borassi, Pierluigi Crescenzi, Andrea Marino, and Henning Meyerhenke. 2019. Computing top-k closeness centrality faster in unweighted graphs. ACM Transactions on Knowledge Discovery from Data 13, 5 (2019), 1--40."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-016-5551-7"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370866"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17--30","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez , Yucheng Low , Haijie Gu , Danny Bickson , and Carlos Guestrin . 2012 . PowerGraph: Distributed graph-parallel computation on natural graphs . In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17--30 . Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17--30."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.41"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295729"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27--40","author":"Hong Changwan","unstructured":"Changwan Hong , Aravind Sukumaranrajam , Jinsung Kim , and P. Sadayappan . 2017. MultiGraph: Efficient graph processing on GPUs . In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27--40 . Changwan Hong, Aravind Sukumaranrajam, Jinsung Kim, and P. Sadayappan. 2017. MultiGraph: Efficient graph processing on GPUs. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27--40."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173180"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3157794.3157799"},{"key":"e_1_2_1_14_1","volume-title":"Hub-accelerator: Fast and exact shortest path computation in large social networks. In arXiv. 1--12.","author":"Jin Ruoming","year":"2013","unstructured":"Ruoming Jin , Ning Ruan , Bo You , and Haixun Wang . 2013 . Hub-accelerator: Fast and exact shortest path computation in large social networks. In arXiv. 1--12. Ruoming Jin, Ning Ruan, Bo You, and Haixun Wang. 2013. Hub-accelerator: Fast and exact shortest path computation in large social networks. In arXiv. 1--12."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239--252","author":"Khorasani Farzad","unstructured":"Farzad Khorasani , Keval Vora , Rajiv Gupta , and Laxmi N. Bhuyan . 2014. CuSha: Vertex-centric graph processing on GPUs . In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239--252 . Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239--252."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915204"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2907294.2907312"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3205289.3205292"},{"key":"e_1_2_1_19_1","first-page":"1","article-title":"Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of the 2005 International Conference for High Performance Computing","volume":"68","author":"Liu Hang","year":"2015","unstructured":"Hang Liu and H. Howie Huang . 2015 . Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of the 2005 International Conference for High Performance Computing , Networking, Storage and Analysis. 68 : 1 \u2013 68 :12. Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis. 68:1\u201368:12.","journal-title":"Networking, Storage and Analysis."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference. 411--428","author":"Liu Hang","unstructured":"Hang Liu and H. Howie Huang . 2019. SIMD-X: Programming and processing of graph algorithms on GPUs . In Proceedings of the 2019 USENIX Annual Technical Conference. 411--428 . Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and processing of graph algorithms on GPUs. In Proceedings of the 2019 USENIX Annual Technical Conference. 411--428."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-018-7443-z"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 2017 USENIX Annual Technical Conference. 195--207","author":"Ma Lingxiao","year":"2017","unstructured":"Lingxiao Ma , Zhi Yang , Han Chen , Jilong Xue , and Yafei Dai . 2017 . Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication . In Proceedings of the 2017 USENIX Annual Technical Conference. 195--207 . Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the 2017 USENIX Annual Technical Conference. 195--207."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00010"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479--490","author":"Pan Yuechao","unstructured":"Yuechao Pan , Yangzihao Wang , Yuduo Wu , Carl Yang , and John D. Owens . 2017. Multi-GPU graph analytics . In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479--490 . Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU graph analytics. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479--490."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3282495.3282501"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037747"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 11:1\u201311:12","author":"Wang Yangzihao","unstructured":"Yangzihao Wang , Andrew Davidson , Yuechao Pan , Yuduo Wu , Andy Riffel , and John D. Owens . 2016. Gunrock: A high-performance graph processing library on the GPU . In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 11:1\u201311:12 . Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 11:1\u201311:12."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3108140"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173208"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.235"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2624289"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2781241"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304029"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-014-3472-4"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3170434"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.111"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3416495","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3416495","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:21Z","timestamp":1750197681000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3416495"}},"subtitle":["Maximizing Data Parallelism for Efficient Iterative Graph Processing on GPUs"],"short-title":[],"issued":{"date-parts":[[2020,9,30]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3416495"],"URL":"https:\/\/doi.org\/10.1145\/3416495","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,30]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}