{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T02:52:38Z","timestamp":1776394358670,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2016,5,31]],"date-time":"2016-05-31T00:00:00Z","timestamp":1464652800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2016,5,31]]},"DOI":"10.1145\/2907294.2907297","type":"proceedings-article","created":{"date-parts":[[2016,6,2]],"date-time":"2016-06-02T19:23:42Z","timestamp":1464895422000},"page":"219-230","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":42,"title":["Faster and Cheaper"],"prefix":"10.1145","author":[{"given":"Wei","family":"Tan","sequence":"first","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, NY, USA"}]},{"given":"Liangliang","family":"Cao","sequence":"additional","affiliation":[{"name":"Yahoo! Labs, New York City, NY, USA"}]},{"given":"Liana","family":"Fong","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, NY, USA"}]}],"member":"320","published-online":{"date-parts":[[2016,5,31]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"ICML","author":"Arora S.","year":"2013","unstructured":"S. Arora , R. Ge , Y. Halpern , D. Mimno , A. Moitra , D. Sontag , Y. Wu , and M. Zhu . A practical algorithm for topic modeling with provable guarantees . In ICML , 2013 . S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, and M. Zhu. A practical algorithm for topic modeling with provable guarantees. In ICML, 2013."},{"key":"e_1_3_2_1_2_1","first-page":"351","volume-title":"JMLR","author":"Bottou L.","year":"2011","unstructured":"L. Bottou and O. Bousquet . The tradeo-offs of large-scale learning. optimization for machine learning . JMLR , pages 351 -- 368 , 2011 . L. Bottou and O. Bousquet. The tradeo-offs of large-scale learning. optimization for machine learning. JMLR, pages 351--368, 2011."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33078-0_22"},{"key":"e_1_3_2_1_4_1","volume-title":"EMNLP","author":"Canny J.","year":"2013","unstructured":"J. Canny , D. L. W. Hall , and D. Klein . A multi-teraflop constituency parser using gpus . In EMNLP , 2013 . J. Canny, D. L. W. Hall, and D. Klein. A multi-teraflop constituency parser using gpus. In EMNLP, 2013."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2668133"},{"key":"e_1_3_2_1_6_1","first-page":"1337","volume-title":"ICML","author":"Coates A.","year":"2013","unstructured":"A. Coates , B. Huval , T. Wang , D. Wu , B. Catanzaro , and N. Andrew . Deep learning with cots hpc systems . In ICML , pages 1337 -- 1345 , 2013 . A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew. Deep learning with cots hpc systems. In ICML, pages 1337--1345, 2013."},{"key":"e_1_3_2_1_7_1","volume-title":"USENIX ATC","author":"Cui H.","year":"2014","unstructured":"H. Cui , J. Cipar , Q. Ho , J. K. Kim , S. Lee , A. Kumar , J. Wei , W. Dai , G. R. Ganger , P. B. Gibbons , G. A. Gibson , and E. P. Xing . Exploiting bounded staleness to speed up big data analytics . In USENIX ATC , 2014 . H. Cui, J. Cipar, Q. Ho, J. K. Kim, S. Lee, A. Kumar, J. Wei, W. Dai, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Exploiting bounded staleness to speed up big data analytics. In USENIX ATC, 2014."},{"key":"e_1_3_2_1_8_1","volume-title":"NIPS","author":"Dean J.","year":"2012","unstructured":"J. Dean , G. S. Corrado , R. Monga , K. Chen , M. Devin , Q. V. Le , M. Z. Mao , M. Ranzato , A. Senior , P. Tucker , K. Yang , and A. Y. Ng . Large scale distributed deep networks . In NIPS , 2012 . J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, 2012."},{"key":"e_1_3_2_1_9_1","volume-title":"KDD Cup 2011 competition","author":"Dror G.","year":"2012","unstructured":"G. Dror , N. Koenigstein , Y. Koren , and M. Weimer . The yahoo! music dataset and kdd-cup '11 . In KDD Cup 2011 competition , 2012 . G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The yahoo! music dataset and kdd-cup '11. In KDD Cup 2011 competition, 2012."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020426"},{"key":"e_1_3_2_1_11_1","volume-title":"Recommending items to more than a billion people. https:\/\/code.facebook.com\/posts\/861999383875667\/recommending-items-to-more-than-a-billion-people\/","author":"Kabiljo M.","year":"2015","unstructured":"M. Kabiljo and A. Ilic . Recommending items to more than a billion people. https:\/\/code.facebook.com\/posts\/861999383875667\/recommending-items-to-more-than-a-billion-people\/ , 2015 . {Online; accessed 17-Aug-2015}. M. Kabiljo and A. Ilic. Recommending items to more than a billion people. https:\/\/code.facebook.com\/posts\/861999383875667\/recommending-items-to-more-than-a-billion-people\/, 2015. {Online; accessed 17-Aug-2015}."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_1_13_1","volume-title":"NIPS","author":"Krizhevsky A.","year":"2012","unstructured":"A. Krizhevsky , I. Sutskever , and G. E. Hinton . Imagenet classification with deep convolutional neural networks . In NIPS , 2012 . A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452449"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685095"},{"key":"e_1_3_2_1_16_1","volume-title":"MLlib: Machine Learning in Apache Spark. CoRR, abs\/1505.06807","author":"Meng X.","year":"2015","unstructured":"X. Meng , J. K. Bradley , B. Yavuz , E. R. Sparks , S. Venkataraman , D. Liu , J. Freeman , D. B. Tsai , M. Amde , S. Owen , D. Xin , R. Xin , M. J. Franklin , R. Zadeh , M. Zaharia , and A. Talwalkar . MLlib: Machine Learning in Apache Spark. CoRR, abs\/1505.06807 , 2015 . X. Meng, J. K. Bradley, B. Yavuz, E. R. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine Learning in Apache Spark. CoRR, abs\/1505.06807, 2015."},{"key":"e_1_3_2_1_17_1","volume-title":"NIPS","author":"Niu F.","year":"2011","unstructured":"F. Niu , B. Recht , C. Re , and S. J. Wright . HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent . In NIPS , 2011 . F. Niu, B. Recht, C. Re, and S. J. Wright. HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011."},{"key":"e_1_3_2_1_18_1","unstructured":"NVidia CUDA 7.0. cuBLAS. http:\/\/docs.nvidia.com\/cuda\/cublas\/ 2015. {Online; accessed 17-Aug-2015}.  NVidia CUDA 7.0. cuBLAS. http:\/\/docs.nvidia.com\/cuda\/cublas\/ 2015. {Online; accessed 17-Aug-2015}."},{"key":"e_1_3_2_1_19_1","unstructured":"NVidia CUDA 7.0. cuSPARSE. http:\/\/docs.nvidia.com\/cuda\/cusparse\/#cusparse-lt-t-gt-csrmm2 2015. {Online; accessed 4-Aug-2015}.  NVidia CUDA 7.0. cuSPARSE. http:\/\/docs.nvidia.com\/cuda\/cusparse\/#cusparse-lt-t-gt-csrmm2 2015. {Online; accessed 4-Aug-2015}."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1864708.1864726"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/245108.245121"},{"key":"e_1_3_2_1_23_1","unstructured":"T. Rohrmann. How to factorize a 700 GB matrix with Apache Flink. http:\/\/data-artisans.com\/how-to-factorize-a-700-gb-matrix-with-apache-flink\/ 2015. {Online; accessed 15-Aug-2015}.  T. Rohrmann. How to factorize a 700 GB matrix with Apache Flink. http:\/\/data-artisans.com\/how-to-factorize-a-700-gb-matrix-with-apache-flink\/ 2015. {Online; accessed 15-Aug-2015}."},{"key":"e_1_3_2_1_24_1","volume-title":"NIPS Workshop on Distributed Matrix Computations","author":"Schelter S.","year":"2014","unstructured":"S. Schelter , V. Satuluri , and R. B. Zadeh . Factorbird-a parameter server approach to distributed matrix factorization . In NIPS Workshop on Distributed Matrix Computations , 2014 . S. Schelter, V. Satuluri, and R. B. Zadeh. Factorbird-a parameter server approach to distributed matrix factorization. In NIPS Workshop on Distributed Matrix Computations, 2014."},{"key":"e_1_3_2_1_25_1","volume-title":"Web data: Amazon reviews. https:\/\/snap.stanford.edu\/data\/web-Amazon.html","author":"Lab SNAP","year":"2015","unstructured":"Stanford SNAP Lab . Web data: Amazon reviews. https:\/\/snap.stanford.edu\/data\/web-Amazon.html , 2015 . {Online; accessed 18-Aug-2015}. Stanford SNAP Lab. Web data: Amazon reviews. https:\/\/snap.stanford.edu\/data\/web-Amazon.html, 2015. {Online; accessed 18-Aug-2015}."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2012.120"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783323"},{"key":"e_1_3_2_1_28_1","volume-title":"Scalable Collaborative Filtering with Spark MLlib. https:\/\/databricks.com\/blog\/2014\/07\/23\/scalable-collaborative-filtering-with-spark-mllib.html","author":"Yavuz B.","year":"2014","unstructured":"B. Yavuz , X. Meng , and R. Xin . Scalable Collaborative Filtering with Spark MLlib. https:\/\/databricks.com\/blog\/2014\/07\/23\/scalable-collaborative-filtering-with-spark-mllib.html , 2014 . {Online; accessed 15-Aug-2015}. B. Yavuz, X. Meng, and R. Xin. Scalable Collaborative Filtering with Spark MLlib. https:\/\/databricks.com\/blog\/2014\/07\/23\/scalable-collaborative-filtering-with-spark-mllib.html, 2014. {Online; accessed 15-Aug-2015}."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2012.168"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732967.2732973"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-68880-8_32"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507164"}],"event":{"name":"HPDC'16: The 25th International Symposium on High-Performance Parallel and Distributed Computing","location":"Kyoto Japan","acronym":"HPDC'16","sponsor":["University of Arizona University of Arizona","SIGARCH ACM Special Interest Group on Computer Architecture","SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing"]},"container-title":["Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2907294.2907297","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2907294.2907297","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:54:25Z","timestamp":1750222465000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2907294.2907297"}},"subtitle":["Parallelizing Large-Scale Matrix Factorization on GPUs"],"short-title":[],"issued":{"date-parts":[[2016,5,31]]},"references-count":32,"alternative-id":["10.1145\/2907294.2907297","10.1145\/2907294"],"URL":"https:\/\/doi.org\/10.1145\/2907294.2907297","relation":{},"subject":[],"published":{"date-parts":[[2016,5,31]]},"assertion":[{"value":"2016-05-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}