{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T10:50:19Z","timestamp":1770979819457,"version":"3.50.1"},"reference-count":73,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:p>Due to the limited capacity of GPU memory, the majority of prior work on graph applications on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware and software advances make it possible to address much larger host memory transparently as a part of a feature known as unified virtual memory. While accessing host memory over an interconnect is understandably slower, the problem space has not been sufficiently explored in the context of a challenging workload with low computational intensity and an irregular data access pattern such as graph traversal. We analyse the performance of breadth first search (BFS) for several large graphs in the context of unified memory and identify the key factors that contribute to slowdowns. Next, we propose a lightweight offline graph reordering algorithm, HALO (Harmonic Locality Ordering), that can be used as a pre-processing step for static graphs. HALO yields speedups of 1.5x-1.9x over baseline in subsequent traversals. Our method specifically aims to cover large directed real world graphs in addition to undirected graphs whereas prior methods only account for the latter. Additionally, we demonstrate ties between the locality ordering problem and graph compression and show that prior methods from graph compression such as recursive graph bisection can be suitably adapted to this problem.<\/jats:p>","DOI":"10.14778\/3384345.3384358","type":"journal-article","created":{"date-parts":[[2020,3,26]],"date-time":"2020-03-26T14:21:06Z","timestamp":1585232466000},"page":"1119-1133","source":"Crossref","is-referenced-by-count":51,"title":["Traversing large graphs on GPUs with unified memory"],"prefix":"10.14778","volume":"13","author":[{"given":"Prasun","family":"Gera","sequence":"first","affiliation":[{"name":"Georgia Tech"}]},{"given":"Hyojong","family":"Kim","sequence":"additional","affiliation":[{"name":"Georgia Tech"}]},{"given":"Piyush","family":"Sao","sequence":"additional","affiliation":[{"name":"Oak Ridge National Lab"}]},{"given":"Hyesoon","family":"Kim","sequence":"additional","affiliation":[{"name":"Georgia Tech"}]},{"given":"David","family":"Bader","sequence":"additional","affiliation":[{"name":"New Jersey Institute of Tech"}]}],"member":"320","published-online":{"date-parts":[[2020,3,26]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Heterogeneous memory management (hmm) the linux kernel documentation. https: \/\/www.kernel.org\/doc\/html\/latest\/vm\/hmm.html. (Accessed on 02\/24\/2020).  Heterogeneous memory management (hmm) the linux kernel documentation. https: \/\/www.kernel.org\/doc\/html\/latest\/vm\/hmm.html. (Accessed on 02\/24\/2020)."},{"key":"e_1_2_1_2_1","unstructured":"Shared virtual memory. https:\/\/www.khronos.org\/registry\/OpenCL\/sdk\/2. 1\/docs\/man\/xhtml\/sharedVirtualMemory.html. (Accessed on 02\/24\/2020).  Shared virtual memory. https:\/\/www.khronos.org\/registry\/OpenCL\/sdk\/2. 1\/docs\/man\/xhtml\/sharedVirtualMemory.html. (Accessed on 02\/24\/2020)."},{"key":"e_1_2_1_3_1","volume-title":"Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems (TODS), 42(4):20","author":"Aberger C. R.","year":"2017","unstructured":"C. R. Aberger , A. Lamb , S. Tu , A. Notzli , K. Olukotun , and C. Re . Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems (TODS), 42(4):20 , 2017 . C. R. Aberger, A. Lamb, S. Tu, A. Notzli, K. Olukotun, and C. Re. Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems (TODS), 42(4):20, 2017."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1109557.1109623"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3390\/a2031031"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.110"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123975"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/1777879.1777889"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2018.8573478"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1906679"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1155\/2013\/702694"},{"key":"e_1_2_1_12_1","volume-title":"The GAP Benchmark Suite","author":"Beamer S.","year":"2015","unstructured":"S. Beamer , K. Asanovic , and D. Patterson . The GAP Benchmark Suite , 2015 . S. Beamer, K. Asanovic, and D. Patterson. The GAP Benchmark Suite, 2015."},{"key":"e_1_2_1_13_1","volume-title":"Hoe er. Survey and taxonomy of lossless graph compression and space-efficient graph representations. arXiv preprint arXiv:1806.01799","author":"Besta M.","year":"2018","unstructured":"M. Besta and T. Hoe er. Survey and taxonomy of lossless graph compression and space-efficient graph representations. arXiv preprint arXiv:1806.01799 , 2018 . M. Besta and T. Hoe er. Survey and taxonomy of lossless graph compression and space-efficient graph representations. arXiv preprint arXiv:1806.01799, 2018."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.587"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2567948.2577304"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1080\/15427951.2013.865686"},{"key":"e_1_2_1_19_1","volume-title":"A faster algorithm for betweenness centrality. Journal of mathematical sociology, 25(2):163--177","author":"Brandes U.","year":"2001","unstructured":"U. Brandes . A faster algorithm for betweenness centrality. Journal of mathematical sociology, 25(2):163--177 , 2001 . U. Brandes. A faster algorithm for betweenness centrality. Journal of mathematical sociology, 25(2):163--177, 2001."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1583991.1584053"},{"key":"e_1_2_1_21_1","first-page":"65","volume-title":"Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Buluc A.","unstructured":"A. Buluc and K. Madduri . Parallel breadth-first search on distributed memory systems . In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , page 65 . ACM, 2011. A. Buluc and K. Madduri. Parallel breadth-first search on distributed memory systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 65. ACM, 2011."},{"key":"e_1_2_1_22_1","first-page":"1","volume-title":"Hornet: An efficient data structure for dynamic sparse graphs and matrices on gpus. In 2018 IEEE High Performance extreme Computing Conference (HPEC)","author":"Busato F.","year":"2018","unstructured":"F. Busato , O. Green , N. Bombieri , and D. A. Bader . Hornet: An efficient data structure for dynamic sparse graphs and matrices on gpus. In 2018 IEEE High Performance extreme Computing Conference (HPEC) , pages 1 -- 7 . IEEE , 2018 . F. Busato, O. Green, N. Bombieri, and D. A. Bader. Hornet: An efficient data structure for dynamic sparse graphs and matrices on gpus. In 2018 IEEE High Performance extreme Computing Conference (HPEC), pages 1--7. IEEE, 2018."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01933580"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557049"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/800195.805928"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.45"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3210377.3210414"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939862"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/365411.365449"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC47752.2019.9041948"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45591-4_68"},{"key":"e_1_2_1_33_1","volume-title":"Computers and intractability","author":"Garey M. R.","unstructured":"M. R. Garey and D. S. Johnson . Computers and intractability , volume 29 . M. R. Garey and D. S. Johnson. Computers and intractability, volume 29."},{"key":"e_1_2_1_34_1","volume-title":"STANFORD UNIV CA DEPT OF COMPUTER SCIENCE","author":"George J. A.","year":"1971","unstructured":"J. A. George . Computer implementation of the finite element method. Technical report , STANFORD UNIV CA DEPT OF COMPUTER SCIENCE , 1971 . J. A. George. Computer implementation of the finite element method. Technical report, STANFORD UNIV CA DEPT OF COMPUTER SCIENCE, 1971."},{"key":"e_1_2_1_35_1","volume-title":"Efficient large-scale graph processing on hybrid cpu and gpu systems. arXiv preprint arXiv:1312.3018","author":"Gharaibeh A.","year":"2013","unstructured":"A. Gharaibeh , T. Reza , E. Santos-Neto , L. B. Costa , S. Sallinen , and M. Ripeanu . Efficient large-scale graph processing on hybrid cpu and gpu systems. arXiv preprint arXiv:1312.3018 , 2013 . A. Gharaibeh, T. Reza, E. Santos-Neto, L. B. Costa, S. Sallinen, and M. Ripeanu. Efficient large-scale graph processing on hybrid cpu and gpu systems. arXiv preprint arXiv:1312.3018, 2013."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2016.7761622"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196924"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/1782174.1782200"},{"key":"e_1_2_1_39_1","unstructured":"M. Hussein A. Varshney and L. Davis. On implementing graph cuts on cuda.  M. Hussein A. Varshney and L. Davis. On implementing graph cuts on cuda."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.80"},{"key":"e_1_2_1_41_1","volume-title":"A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices","author":"Karypis G.","year":"1998","unstructured":"G. Karypis and V. Kumar . A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices . 1998 . G. Karypis and V. Kumar. A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. 1998."},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","DOI":"10.1137\/1.9780898719918","volume-title":"Graph algorithms in the language of linear algebra","author":"Kepner J.","year":"2011","unstructured":"J. Kepner and J. Gilbert . Graph algorithms in the language of linear algebra . SIAM , 2011 . J. Kepner and J. Gilbert. Graph algorithms in the language of linear algebra. SIAM, 2011."},{"key":"e_1_2_1_43_1","volume-title":"An efficient heuristic procedure for partitioning graphs. Bell system technical journal, 49(2):291--307","author":"Kernighan B. W.","year":"1970","unstructured":"B. W. Kernighan and S. Lin . An efficient heuristic procedure for partitioning graphs. Bell system technical journal, 49(2):291--307 , 1970 . B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell system technical journal, 49(2):291--307, 1970."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378529"},{"key":"e_1_2_1_45_1","volume-title":"Lawrence Livermore National Lab.(LLNL)","author":"Kumfert G. K.","year":"2000","unstructured":"G. K. Kumfert . Object-oriented algorithmic laboratory for ordering sparse matrices. Technical report , Lawrence Livermore National Lab.(LLNL) , Livermore, CA ( United States) , 2000 . G. K. Kumfert. Object-oriented algorithmic laboratory for ordering sparse matrices. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2000."},{"key":"e_1_2_1_46_1","volume-title":"USENIX","author":"Kyrola A.","year":"2012","unstructured":"A. Kyrola , G. E. Blelloch , and C. Guestrin . Graphchi: Large-scale graph computation on just a pc . USENIX , 2012 . A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. USENIX, 2012."},{"key":"e_1_2_1_47_1","first-page":"459","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Lee E.","year":"2019","unstructured":"E. Lee , J. Kim , K. Lim , S. H. Noh , and J. Seo . Pre-select static caching and neighborhood ordering for bfs-like algorithms on disk-based graph engines . In 2019 USENIX Annual Technical Conference (USENIX ATC 19) , pages 459 -- 474 , Renton, WA , July 2019 . USENIX Association. E. Lee, J. Kim, K. Lim, S. H. Noh, and J. Seo. Pre-select static caching and neighborhood ordering for bfs-like algorithms on disk-based graph engines. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 459--474, Renton, WA, July 2019. USENIX Association."},{"key":"e_1_2_1_48_1","unstructured":"J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data June 2014.  J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data June 2014."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304044"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2320716"},{"key":"e_1_2_1_51_1","first-page":"1","volume-title":"Storage and Analysis, 2015 SC-International Conference for","author":"Liu H.","year":"2015","unstructured":"H. Liu and H. H. Huang . Enterprise: Breadth-first graph traversal on gpus. In High Performance Computing, Networking , Storage and Analysis, 2015 SC-International Conference for , pages 1 -- 12 . IEEE, 2015 . H. Liu and H. H. Huang. Enterprise: Breadth-first graph traversal on gpus. In High Performance Computing, Networking, Storage and Analysis, 2015 SC-International Conference for, pages 1--12. IEEE, 2015."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837274.1837289"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064191"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-15712-8_22"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2013.05.007"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2717511"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0196-6774(03)00076-2"},{"issue":"1","key":"e_1_2_1_58_1","first-page":"60","article-title":"The small world problem","volume":"2","author":"Milgram S.","year":"1967","unstructured":"S. Milgram . The small world problem . Psychology today , 2 ( 1 ): 60 -- 67 , 1967 . S. Milgram. The small world problem. Psychology today, 2(1):60--67, 1967.","journal-title":"Psychology today"},{"key":"e_1_2_1_59_1","volume-title":"Introducing the graph 500","author":"Murphy R. C.","year":"2010","unstructured":"R. C. Murphy , K. B. Wheeler , B. W. Barrett , and J. A. Ang . Introducing the graph 500 . Cray Users Group (CUG) , 19:45--74, 2010 . R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray Users Group (CUG), 19:45--74, 2010."},{"key":"e_1_2_1_60_1","first-page":"1","volume-title":"Storage and Analysis, 2015 SC-International Conference for","author":"Nai L.","year":"2015","unstructured":"L. Nai , Y. Xia , I. G. Tanase , H. Kim , and C.-Y. Lin . Graphbig : understanding graph computing in the context of industrial solutions. In High Performance Computing, Networking , Storage and Analysis, 2015 SC-International Conference for , pages 1 -- 12 . IEEE, 2015 . L. Nai, Y. Xia, I. G. Tanase, H. Kim, and C.-Y. Lin. Graphbig: understanding graph computing in the context of industrial solutions. In High Performance Computing, Networking, Storage and Analysis, 2015 SC-International Conference for, pages 1--12. IEEE, 2015."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.117"},{"key":"e_1_2_1_62_1","volume-title":"Closeness centrality extended to unconnected graphs: The harmonic centrality index. Technical report","author":"Rochat Y.","year":"2009","unstructured":"Y. Rochat . Closeness centrality extended to unconnected graphs: The harmonic centrality index. Technical report , 2009 . Y. Rochat. Closeness centrality extended to unconnected graphs: The harmonic centrality index. Technical report, 2009."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.5555\/2888116.2888372"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02289527"},{"key":"e_1_2_1_65_1","unstructured":"N. Sakharnykh. Maximizing unified memory performance in cuda | nvidia developer blog. https:\/\/devblogs.nvidia.com\/maximizing- unified-memory-performance-cuda\/. (Accessed on 02\/27\/2020).  N. Sakharnykh. Maximizing unified memory performance in cuda | nvidia developer blog. https:\/\/devblogs.nvidia.com\/maximizing- unified-memory-performance-cuda\/. (Accessed on 02\/27\/2020)."},{"key":"e_1_2_1_66_1","volume-title":"May","author":"Sakharnykh N.","year":"2017","unstructured":"N. Sakharnykh . Unified memory on pascal and volta. http:\/\/on-demand.gputechconf.com\/gtc\/2017\/presentation\/s7285-nikolay-sakharnykh-unified- memory-on-pascal-and-volta.pdf , May 2017 . N. Sakharnykh. Unified memory on pascal and volta. http:\/\/on-demand.gputechconf.com\/gtc\/2017\/presentation\/s7285-nikolay-sakharnykh-unified- memory-on-pascal-and-volta.pdf, May 2017."},{"key":"e_1_2_1_67_1","volume-title":"Parboil: A revised benchmark suite for scientific and commercial throughput computing","author":"Stratton J. A.","year":"2012","unstructured":"J. A. Stratton , C. Rodrigues , I.-J. Sung , N. Obeid , L.-W. Chang , N. Anssari , G. D. Liu , and W.-m. W. Hwu . Parboil: A revised benchmark suite for scientific and commercial throughput computing . Center for Reliable and High-Performance Computing , 127, 2012 . J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W.-m. W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 127, 2012."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098057"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1137\/0220006"},{"key":"e_1_2_1_70_1","first-page":"11","volume-title":"ACM SIGPLAN Notices","author":"Wang Y.","unstructured":"Y. Wang , A. Davidson , Y. Pan , Y. Wu , A. Riffel , and J. D. Owens . Gunrock: A high-performance graph processing library on the gpu . In ACM SIGPLAN Notices , volume 51 , page 11 . ACM, 2016. Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: A high-performance graph processing library on the gpu. In ACM SIGPLAN Notices, volume 51, page 11. ACM, 2016."},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915220"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2017.8257937"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446077"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3384345.3384358","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:54:42Z","timestamp":1672221282000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3384345.3384358"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3]]},"references-count":73,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["10.14778\/3384345.3384358"],"URL":"https:\/\/doi.org\/10.14778\/3384345.3384358","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2020,3]]}}}