{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T05:07:31Z","timestamp":1768108051502,"version":"3.49.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2017,11]]},"abstract":"<jats:p>We present Lux, a distributed multi-GPU system that achieves fast graph processing by exploiting the aggregate memory bandwidth of multiple GPUs and taking advantage of locality in the memory hierarchy of multi-GPU clusters. Lux provides two execution models that optimize algorithmic efficiency and enable important GPU optimizations, respectively. Lux also uses a novel dynamic load balancing strategy that is cheap and achieves good load balance across GPUs. In addition, we present a performance model that quantitatively predicts the execution times and automatically selects the runtime configurations for Lux applications. Experiments show that Lux achieves up to 20X speedup over state-of-the-art shared memory systems and up to two orders of magnitude speedup over distributed systems.<\/jats:p>","DOI":"10.14778\/3157794.3157799","type":"journal-article","created":{"date-parts":[[2017,12,12]],"date-time":"2017-12-12T18:33:38Z","timestamp":1513103618000},"page":"297-310","source":"Crossref","is-referenced-by-count":65,"title":["A distributed multi-GPU system for fast graph processing"],"prefix":"10.14778","volume":"11","author":[{"given":"Zhihao","family":"Jia","sequence":"first","affiliation":[{"name":"Stanford University"}]},{"given":"Yongkee","family":"Kwon","sequence":"additional","affiliation":[{"name":"UT Austin"}]},{"given":"Galen","family":"Shipman","sequence":"additional","affiliation":[{"name":"LANL"}]},{"given":"Pat","family":"McCormick","sequence":"additional","affiliation":[{"name":"LANL"}]},{"given":"Mattan","family":"Erez","sequence":"additional","affiliation":[{"name":"UT Austin"}]},{"given":"Alex","family":"Aiken","sequence":"additional","affiliation":[{"name":"Stanford University"}]}],"member":"320","published-online":{"date-parts":[[2017,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon ratings network dataset. http:\/\/konect.uni-koblenz.de\/networks\/amazon-ratings.  Amazon ratings network dataset. http:\/\/konect.uni-koblenz.de\/networks\/amazon-ratings."},{"key":"e_1_2_1_2_1","unstructured":"Apache giraph. http:\/\/http:\/\/giraph.apache.org\/.  Apache giraph. http:\/\/http:\/\/giraph.apache.org\/."},{"key":"e_1_2_1_3_1","unstructured":"Intel Xeon Processor E5-2680 v2. http:\/\/ark.intel.com\/products\/75277\/Intel-Xeon-Processor-E5-2680-v2.  Intel Xeon Processor E5-2680 v2. http:\/\/ark.intel.com\/products\/75277\/Intel-Xeon-Processor-E5-2680-v2."},{"key":"e_1_2_1_4_1","unstructured":"Intel Xeon Processor E7-4860 v2. https:\/\/ark.intel.com\/products\/75249\/Intel-Xeon-Processor-E7-4860-v2.  Intel Xeon Processor E7-4860 v2. https:\/\/ark.intel.com\/products\/75249\/Intel-Xeon-Processor-E7-4860-v2."},{"key":"e_1_2_1_5_1","unstructured":"Lonestar 5 user guide. https:\/\/portal.tacc.utexas.edu\/user-guides\/lonestar5.  Lonestar 5 user guide. https:\/\/portal.tacc.utexas.edu\/user-guides\/lonestar5."},{"key":"e_1_2_1_6_1","unstructured":"NVIDIA NVLink high-speed interconnect. http:\/\/www.nvidia.com\/object\/nvlink.html.  NVIDIA NVLink high-speed interconnect. http:\/\/www.nvidia.com\/object\/nvlink.html."},{"key":"e_1_2_1_7_1","unstructured":"NVidia Tesla K80. http:\/\/www.anandtech.com\/show\/8729\/nvidia-launches-tesla-k80-gk210-gpu.  NVidia Tesla K80. http:\/\/www.anandtech.com\/show\/8729\/nvidia-launches-tesla-k80-gk210-gpu."},{"key":"e_1_2_1_8_1","unstructured":"Server Memory Prices. https:\/\/memory.net\/memory-prices\/.  Server Memory Prices. https:\/\/memory.net\/memory-prices\/."},{"key":"e_1_2_1_9_1","unstructured":"XStream Cray CS-Storm compute cluster. http:\/\/xstream.stanford.edu\/.  XStream Cray CS-Storm compute cluster. http:\/\/xstream.stanford.edu\/."},{"key":"e_1_2_1_10_1","unstructured":"Yahoo! Altavista web page hyperlink connectivity graph. http:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=g.  Yahoo! Altavista web page hyperlink connectivity graph. http:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=g."},{"key":"e_1_2_1_11_1","unstructured":"Yahoo! music user ratings of songs with artist album and genre meta information. \"https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=r\".  Yahoo! music user ratings of songs with artist album and genre meta information. \"https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=r\"."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389086"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389013"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018756"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of KDD cup and workshop","author":"Bennett James","year":"2007","unstructured":"James Bennett , Stan Lanning , The Netflix Prize . In Proceedings of KDD cup and workshop , 2007 . James Bennett, Stan Lanning, et al. The Netflix Prize. In Proceedings of KDD cup and workshop, 2007."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972740.43"},{"key":"e_1_2_1_19_1","volume-title":"Introduction to Algorithms","author":"Cormen Thomas H.","year":"2001","unstructured":"Thomas H. Cormen , Clifford Stein , Ronald L. Rivest , and Charles E. Leiserson . Introduction to Algorithms . McGraw-Hill Higher Education , 2 nd edition, 2001 . Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.","edition":"2"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.2307\/3033543"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2621934.2621936"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez , Yucheng Low , Haijie Gu , Danny Bickson , and Carlos Guestrin . Powergraph : Distributed graph-parallel computation on natural graphs . In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12 , 2012 . Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, 2012."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14","author":"Gonzalez Joseph E.","year":"2014","unstructured":"Joseph E. Gonzalez , Reynold S. Xin , Ankur Dave , Daniel Crankshaw , Michael J. Franklin , and Ion Stoica . GraphX : Graph processing in a distributed dataflow framework . In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14 , 2014 . Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, 2014."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2827872"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807620"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600227"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915204"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_2_1_29_1","volume-title":"Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11(Feb)","author":"Leskovec Jure","year":"2010","unstructured":"Jure Leskovec , Deepayan Chakrabarti , Jon Kleinberg , Christos Faloutsos , and Zoubin Ghahramani . Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11(Feb) , 2010 . Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11(Feb), 2010."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145832"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522739"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610518"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915238"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505617"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442530"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628084"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465371"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2012.6507501"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688538"},{"key":"e_1_2_1_42_1","volume-title":"NSDI'12","author":"Zaharia Matei","year":"2008","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Tathagata Das , Ankur Dave , Justin Ma , Murphy McCauley , Michael J. Franklin , Scott Shenker , and Ion Stoica . Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing . In NSDI'12 , San Jose, CA , 2008 . Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI'12, San Jose, CA, 2008."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688507"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.111"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735501"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3157794.3157799","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:37:15Z","timestamp":1672220235000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3157794.3157799"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,11]]},"references-count":45,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,11]]}},"alternative-id":["10.14778\/3157794.3157799"],"URL":"https:\/\/doi.org\/10.14778\/3157794.3157799","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2017,11]]}}}