{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T20:12:55Z","timestamp":1768075975672,"version":"3.49.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,7]]},"abstract":"<jats:p>We develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred between processors in a decentralized fashion. As a consequence it is a lock-free parallel algorithm. In spite of being asynchronous, the variable updates of NOMAD are serializable, that is, there is an equivalent update ordering in a serial implementation. NOMAD outperforms synchronous algorithms which require explicit bulk synchronization after every iteration: our extensive empirical evaluation shows that not only does our algorithm perform well in distributed setting on commodity hardware, but also outperforms state-of-the-art algorithms on a HPC cluster both in multi-core and distributed memory settings.<\/jats:p>","DOI":"10.14778\/2732967.2732973","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"975-986","source":"Crossref","is-referenced-by-count":75,"title":["NOMAD"],"prefix":"10.14778","volume":"7","author":[{"given":"Hyokun","family":"Yun","sequence":"first","affiliation":[{"name":"Purdue University"}]},{"given":"Hsiang-Fu","family":"Yu","sequence":"additional","affiliation":[{"name":"University of Texas, Austin"}]},{"given":"Cho-Jui","family":"Hsieh","sequence":"additional","affiliation":[{"name":"University of Texas, Austin"}]},{"given":"S. V. N.","family":"Vishwanathan","sequence":"additional","affiliation":[{"name":"Purdue University"}]},{"given":"Inderjit","family":"Dhillon","sequence":"additional","affiliation":[{"name":"University of Texas, Austin"}]}],"member":"320","published-online":{"date-parts":[[2014,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache Hadoop 2009. http:\/\/hadoop.apache.org\/core\/. Apache Hadoop 2009. http:\/\/hadoop.apache.org\/core\/."},{"key":"e_1_2_1_2_1","unstructured":"Graphlab datasets 2013. http:\/\/graphlab.org\/downloads\/datasets\/. Graphlab datasets 2013. http:\/\/graphlab.org\/downloads\/datasets\/."},{"key":"e_1_2_1_3_1","unstructured":"Intel thread building blocks 2013. https:\/\/www.threadingbuildingblocks.org\/. Intel thread building blocks 2013. https:\/\/www.threadingbuildingblocks.org\/."},{"key":"e_1_2_1_4_1","volume-title":"A reliable effective terascale linear learning system. CoRR, abs\/1110.4198","author":"Agarwal A.","year":"2011","unstructured":"A. Agarwal , O. Chapelle , M. Dud\u00edk , and J. Langford . A reliable effective terascale linear learning system. CoRR, abs\/1110.4198 , 2011 . A. Agarwal, O. Chapelle, M. Dud\u00edk, and J. Langford. A reliable effective terascale linear learning system. CoRR, abs\/1110.4198, 2011."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1345448.1345465"},{"key":"e_1_2_1_6_1","volume-title":"Parallel and Distributed Computation: Numerical Methods","author":"Bertsekas D. P.","year":"1997","unstructured":"D. P. Bertsekas and J. N. Tsitsiklis . Parallel and Distributed Computation: Numerical Methods . Athena Scientific , 1997 . D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific, 1997."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","first-page":"351","DOI":"10.7551\/mitpress\/8996.003.0015","volume-title":"Optimization for Machine Learning","author":"Bottou L.","year":"2011","unstructured":"L. Bottou and O. Bousquet . The tradeoffs of large-scale learning . Optimization for Machine Learning , pages 351 -- 368 , 2011 . L. Bottou and O. Bousquet. The tradeoffs of large-scale learning. Optimization for Machine Learning, pages 351--368, 2011."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/0024-3795(69)90028-7"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_10_1","first-page":"8","article-title":"The Yahoo! music dataset and KDD-Cup'11","volume":"18","author":"Dror G.","year":"2012","unstructured":"G. Dror , N. Koenigstein , Y. Koren , and M. Weimer . The Yahoo! music dataset and KDD-Cup'11 . Journal of Machine Learning Research-Proceedings Track , 18 : 8 -- 18 , 2012 . G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The Yahoo! music dataset and KDD-Cup'11. Journal of Machine Learning Research-Proceedings Track, 18: 8--18, 2012.","journal-title":"Journal of Machine Learning Research-Proceedings Track"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-0427(00)00409-X"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020426"},{"key":"e_1_2_1_13_1","first-page":"17","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation","author":"Gonzalez J. E.","year":"2012","unstructured":"J. E. Gonzalez , Y. Low , H. Gu , D. Bickson , and C. Guestrin . Powergraph: Distributed graph-parallel computation on natural graphs . In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation , pages 17 -- 30 , 2012 . J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, pages 17--30, 2012."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2007.06.006"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020577"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4684-9352-8"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12532-013-0053-8"},{"key":"e_1_2_1_19_1","first-page":"693","volume-title":"Proceedings of the conference on Advances in Neural Information Processing Systems","author":"Recht B.","year":"2011","unstructured":"B. Recht , C. Re , S. Wright , and F. Niu . Hogwild: A lock-free approach to parallelizing stochastic gradient descent . In Proceedings of the conference on Advances in Neural Information Processing Systems , pages 693 -- 701 , 2011 . B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Proceedings of the conference on Advances in Neural Information Processing Systems, pages 693--701, 2011."},{"key":"e_1_2_1_20_1","volume-title":"Distributed coordinate descent method for learning with big data","author":"Richtarik P.","year":"2013","unstructured":"P. Richtarik and M. Takac . Distributed coordinate descent method for learning with big data . 2013 . URL \"http:\/\/arxiv.org\/abs\/1310.2059\". P. Richtarik and M. Takac. Distributed coordinate descent method for learning with big data. 2013. URL \"http:\/\/arxiv.org\/abs\/1310.2059\"."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729586"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390273"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920931"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963491"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2012.120"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2012.168"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-68880-8_32"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507164"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2732967.2732973","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,10]],"date-time":"2023-08-10T12:21:02Z","timestamp":1691670062000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2732967.2732973"}},"subtitle":["non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion"],"short-title":[],"issued":{"date-parts":[[2014,7]]},"references-count":28,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2014,7]]}},"alternative-id":["10.14778\/2732967.2732973"],"URL":"https:\/\/doi.org\/10.14778\/2732967.2732973","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,7]]}}}