{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T12:32:58Z","timestamp":1772022778754,"version":"3.50.1"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,6,26]],"date-time":"2021-06-26T00:00:00Z","timestamp":1624665600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000015","name":"Department of Energy","doi-asserted-by":"crossref","award":["DE-AC05-00OR22725"],"award-info":[{"award-number":["DE-AC05-00OR22725"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and\/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.<\/jats:p>","DOI":"10.1145\/3432185","type":"journal-article","created":{"date-parts":[[2021,6,26]],"date-time":"2021-06-26T16:14:12Z","timestamp":1624724052000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["PLANC"],"prefix":"10.1145","volume":"47","author":[{"given":"Srinivas","family":"Eswar","sequence":"first","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA"}]},{"given":"Koby","family":"Hayashi","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1557-8027","authenticated-orcid":false,"given":"Grey","family":"Ballard","sequence":"additional","affiliation":[{"name":"Wake Forest University, Winston-Salem, NC"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5852-4806","authenticated-orcid":false,"given":"Ramakrishnan","family":"Kannan","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN"}]},{"given":"Michael A.","family":"Matheson","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN"}]},{"given":"Haesun","family":"Park","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA"}]}],"member":"320","published-online":{"date-parts":[[2021,6,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2697055"},{"key":"e_1_2_1_2_1","first-page":"1","article-title":"Efficient MATLAB computations with sparse and factored tensors","volume":"30","author":"Bader Brett W.","year":"2007","unstructured":"Brett W. Bader and Tamara G. Kolda . 2007 . Efficient MATLAB computations with sparse and factored tensors . SIAM J. Sci. Comput. 30 , 1 (Dec. 2007), 205\u2013231. DOI:https:\/\/doi.org\/10.1137\/060676489 Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM J. Sci. Comput. 30, 1 (Dec. 2007), 205\u2013231. DOI:https:\/\/doi.org\/10.1137\/060676489","journal-title":"SIAM J. Sci. Comput."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2018.00012"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00065"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611976137.1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2012.6408676"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/17M1112303"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000016"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7439(98)00011-2"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1099-128X(199709\/10)11:5<393::AID-CEM483>3.0.CO;2-L"},{"key":"e_1_2_1_11_1","volume-title":"\u201cEckart-Young","author":"Douglas Carroll J.","year":"1970","unstructured":"J. Douglas Carroll and Jih-Jie Chang . 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of \u201cEckart-Young \u201d decomposition. Psychometrika 35, 3 (01 Sep 1970 ), 283\u2013319. DOI:https:\/\/doi.org\/10.1007\/BF02310791 J. Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of \u201cEckart-Young\u201d decomposition. Psychometrika 35, 3 (01 Sep 1970), 283\u2013319. DOI:https:\/\/doi.org\/10.1007\/BF02310791"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1285358.1285359"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-020-03181-6"},{"key":"e_1_2_1_14_1","volume-title":"Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. Fund. Electron. Commun. Comput. Sci. E92-A","author":"Cichocki Andrzej","year":"2009","unstructured":"Andrzej Cichocki and Anh-Huy Phan . 2009. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. Fund. Electron. Commun. Comput. Sci. E92-A ( 2009 ), 708\u2013721. Issue 3. http:\/\/dx.doi.org\/10.1587\/transfun.E92.A.708 Andrzej Cichocki and Anh-Huy Phan. 2009. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. Fund. Electron. Commun. Comput. Sci. E92-A (2009), 708\u2013721. Issue 3. http:\/\/dx.doi.org\/10.1587\/transfun.E92.A.708"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2008.4408452"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-010-0331-6"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2018.01.007"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201920)","author":"Eswar S.","unstructured":"S. Eswar , K. Hayashi , G. Ballard , R. Kannan , R. Vuduc , and H. Park . 2020. Distributed-memory parallel symmetric nonnegative matrix factorization . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201920) . IEEE Computer Society, 1041\u20131054. Retrieved from https:\/\/dl.acm.org\/doi\/10.5555\/3433701.3433799. S. Eswar, K. Hayashi, G. Ballard, R. Kannan, R. Vuduc, and H. Park. 2020. Distributed-memory parallel symmetric nonnegative matrix factorization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201920). IEEE Computer Society, 1041\u20131054. Retrieved from https:\/\/dl.acm.org\/doi\/10.5555\/3433701.3433799."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682465"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1137\/090764189"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0962492914000087"},{"key":"e_1_2_1_22_1","first-page":"54","article-title":"Alternating projected Barzilai-Borwein methods for nonnegative matrix factorization","volume":"36","author":"Han Lixing","year":"2009","unstructured":"Lixing Han , Michael Neumann , and Upendra Prasad . 2009 . Alternating projected Barzilai-Borwein methods for nonnegative matrix factorization . Electron. Trans. Numer. Anal 36 , 6 (2009), 54 \u2013 82 . Retrieved from http:\/\/etna.mcs.kent.edu\/volumes\/2001-2010\/vol36\/abstract.php?vol=36&pages=54-82. Lixing Han, Michael Neumann, and Upendra Prasad. 2009. Alternating projected Barzilai-Borwein methods for nonnegative matrix factorization. Electron. Trans. Numer. Anal 36, 6 (2009), 54\u201382. Retrieved from http:\/\/etna.mcs.kent.edu\/volumes\/2001-2010\/vol36\/abstract.php?vol=36&pages=54-82.","journal-title":"Electron. Trans. Numer. Anal"},{"key":"e_1_2_1_23_1","first-page":"1","article-title":"Foundations of the PARAFAC procedure: Models and conditions for an explanatory multimodal factor analysis","volume":"16","author":"Harshman Richard A.","year":"1970","unstructured":"Richard A. Harshman . 1970 . Foundations of the PARAFAC procedure: Models and conditions for an explanatory multimodal factor analysis . Working Papers Phonet. 16 , 10,085 (1970), 1 \u2013 84 . http:\/\/www.psychology.uwo.ca\/faculty\/harshman\/wpppfac0.pdf Richard A. Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an explanatory multimodal factor analysis. Working Papers Phonet. 16, 10,085 (1970), 1\u201384. http:\/\/www.psychology.uwo.ca\/faculty\/harshman\/wpppfac0.pdf","journal-title":"Working Papers Phonet."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 23rd European Signal Processing Conference (EUSIPCO\u201915)","author":"Huang Kejun","year":"2015","unstructured":"Kejun Huang , Nicholas D. Sidiropoulos , and Athanasios P. Liavas . 2015. Efficient algorithms for universally constrained matrix and tensor factorization . In Proceedings of the 23rd European Signal Processing Conference (EUSIPCO\u201915) . IEEE, 2521\u20132525. http:\/\/dx.doi.org\/10.1109\/EUSIPCO. 2015 .7362839 Kejun Huang, Nicholas D. Sidiropoulos, and Athanasios P. Liavas. 2015. Efficient algorithms for universally constrained matrix and tensor factorization. In Proceedings of the 23rd European Signal Processing Conference (EUSIPCO\u201915). IEEE, 2521\u20132525. http:\/\/dx.doi.org\/10.1109\/EUSIPCO.2015.7362839"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2016.2576427"},{"key":"e_1_2_1_27_1","volume-title":"Lupini","author":"Jesse Stephen","year":"2016","unstructured":"Stephen Jesse , Miaofang Chi , Albina Borisevich , Alexei Belianinov , Sergei Kalinin , Eirik Endeve , Richard K. Archibald , Christopher T. Symons , and Andrew R . Lupini . 2016 . Using multivariate analysis of scanning-Ronchigram data to reveal material functionality. Microsc. Microanal . 22 (July 2016), 292\u2013293. http:\/\/dx.doi.org\/10.1017\/S1431927616002312 Stephen Jesse, Miaofang Chi, Albina Borisevich, Alexei Belianinov, Sergei Kalinin, Eirik Endeve, Richard K. Archibald, Christopher T. Symons, and Andrew R. Lupini. 2016. Using multivariate analysis of scanning-Ronchigram data to reveal material functionality. Microsc. Microanal. 22 (July 2016), 292\u2013293. http:\/\/dx.doi.org\/10.1017\/S1431927616002312"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851152"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2767592"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-018-0525-3"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2016.19"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1137\/16M1102744"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the SIAM International Conference on Data Mining. SIAM, 343\u2013354","author":"Kim Dongmin","year":"1972","unstructured":"Dongmin Kim , Suvrit Sra , and Inderjit S. Dhillon . 2007. Fast Newton-type methods for the least-squares nonnegative matrix approximation problem . In Proceedings of the SIAM International Conference on Data Mining. SIAM, 343\u2013354 . https:\/\/doi.org\/10.1137\/1.978161 1972 771.31 Dongmin Kim, Suvrit Sra, and Inderjit S. Dhillon. 2007. Fast Newton-type methods for the least-squares nonnegative matrix approximation problem. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 343\u2013354. https:\/\/doi.org\/10.1137\/1.9781611972771.31"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10898-013-0035-4"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1137\/110821172"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.celrep.2016.12.004"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X"},{"key":"e_1_2_1_39_1","volume-title":"Hanson","author":"Lawson Charles L.","year":"1995","unstructured":"Charles L. Lawson and Richard J . Hanson . 1995 . Solving Least Squares Problems. Vol. 15 . SIAM. https:\/\/doi.org\/10.1137\/1.9781611971217 Charles L. Lawson and Richard J. Hanson. 1995. Solving Least Squares Problems. Vol. 15. SIAM. https:\/\/doi.org\/10.1137\/1.9781611971217"},{"key":"e_1_2_1_40_1","volume-title":"Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755","author":"Lee Daniel D.","year":"1999","unstructured":"Daniel D. Lee and H. Sebastian Seung . 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 ( 1999 ), 788. https:\/\/doi.org\/10.1038\/44565 Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788. https:\/\/doi.org\/10.1038\/44565"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201917)","author":"Li J.","year":"2017","unstructured":"J. Li , J. Choi , I. Perros , J. Sun , and R. Vuduc . 2017. Model-driven sparse CP decomposition for higher-order tensors . In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201917) . 1048\u20131057. DOI:https:\/\/doi.org\/10.1109\/IPDPS. 2017 .80 J. Li, J. Choi, I. Perros, J. Sun, and R. Vuduc. 2017. Model-driven sparse CP decomposition for higher-order tensors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201917). 1048\u20131057. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2017.80"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2017.2777399"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201917)","author":"Liavas Athanasios P.","year":"2017","unstructured":"Athanasios P. Liavas , Georgios Kostoulas , Georgios Lourakis , Kejun Huang , and Nicholas D. Sidiropoulos . 2017. Nesterov-based parallel algorithm for large-scale nonnegative tensor factorization . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201917) . IEEE, 5895\u20135899. https:\/\/doi.org\/10.1109\/ICASSP. 2017 .7953287 Athanasios P. Liavas, Georgios Kostoulas, Georgios Lourakis, Kejun Huang, and Nicholas D. Sidiropoulos. 2017. Nesterov-based parallel algorithm for large-scale nonnegative tensor factorization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201917). IEEE, 5895\u20135899. https:\/\/doi.org\/10.1109\/ICASSP.2017.7953287"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2007.19.10.2756"},{"key":"e_1_2_1_45_1","unstructured":"Linjian Ma and Edgar Solomonik. 2018. Accelerating alternating least squares for tensor decomposition by pairwise perturbation. Retrieved from https:\/\/arxiv.org\/abs\/1811.10573.  Linjian Ma and Edgar Solomonik. 2018. Accelerating alternating least squares for tensor decomposition by pairwise perturbation. Retrieved from https:\/\/arxiv.org\/abs\/1811.10573."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC50609.2020.00028"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10957-005-2668-z"},{"key":"e_1_2_1_48_1","unstructured":"Gordon E. Moon Aravind Sukumaran-Rajam Srinivasan Parthasarathy and P. Sadayappan. 2019. PL-NMF: Parallel locality-optimized non-negative matrix factorization. Retrieved from https:\/\/arxiv.org\/abs\/1904.07935.  Gordon E. Moon Aravind Sukumaran-Rajam Srinivasan Parthasarathy and P. Sadayappan. 2019. PL-NMF: Parallel locality-optimized non-negative matrix factorization. Retrieved from https:\/\/arxiv.org\/abs\/1904.07935."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201919)","author":"Nisa Israt","year":"2019","unstructured":"Israt Nisa , Jiajia Li , Aravind Sukumaran-Rajam , Richard Vuduc , and P. Sadayappan . 2019. Load-balanced sparse MTTKRP on GPUs . In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201919) . IEEE, 123\u2013133. https:\/\/doi.org\/10.1109\/IPDPS. 2019 .00023 Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard Vuduc, and P. Sadayappan. 2019. Load-balanced sparse MTTKRP on GPUs. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201919). IEEE, 123\u2013133. https:\/\/doi.org\/10.1109\/IPDPS.2019.00023"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7439(97)00031-2"},{"key":"e_1_2_1_51_1","volume-title":"Sidiropoulos","author":"Papalexakis Evangelos E.","year":"2012","unstructured":"Evangelos E. Papalexakis , Christos Faloutsos , and Nicholas D . Sidiropoulos . 2012 . Parcube : Sparse parallelizable tensor decompositions. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer , 521\u2013536. https:\/\/doi.org\/10.1007\/978-3-642-33460-3_39 Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D. Sidiropoulos. 2012. Parcube: Sparse parallelizable tensor decompositions. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 521\u2013536. https:\/\/doi.org\/10.1007\/978-3-642-33460-3_39"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2010.06.030"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2013.2269903"},{"key":"e_1_2_1_54_1","volume-title":"TENSORBOX: A MATLAB package for tensor decomposition.","author":"Phan Anh-Huy","year":"2013","unstructured":"Anh-Huy Phan , Petr Tichavsky , and Andrzej Cichocki . 2013 . TENSORBOX: A MATLAB package for tensor decomposition. Retrieved from https:\/\/github.com\/phananhhuy\/TensorBox. Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. 2013. TENSORBOX: A MATLAB package for tensor decomposition. Retrieved from https:\/\/github.com\/phananhhuy\/TensorBox."},{"key":"e_1_2_1_55_1","volume-title":"Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical Report. NICTA.","author":"Sanderson Conrad","year":"2010","unstructured":"Conrad Sanderson . 2010 . Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical Report. NICTA. Retrieved from http:\/\/arma.sourceforge.net\/armadillo_nicta_2010.pdf. Conrad Sanderson. 2010. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical Report. NICTA. Retrieved from http:\/\/arma.sourceforge.net\/armadillo_nicta_2010.pdf."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2017.2690524"},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the 46th International Conference on Parallel Processing (ICPP\u201917)","author":"Smith S.","year":"2017","unstructured":"S. Smith , A. Beri , and G. Karypis . 2017. Constrained tensor factorization with accelerated AO-ADMM . In Proceedings of the 46th International Conference on Parallel Processing (ICPP\u201917) . 111\u2013120. DOI:https:\/\/doi.org\/10.1109\/ICPP. 2017 .20 S. Smith, A. Beri, and G. Karypis. 2017. Constrained tensor factorization with accelerated AO-ADMM. In Proceedings of the 46th International Conference on Parallel Processing (ICPP\u201917). 111\u2013120. DOI:https:\/\/doi.org\/10.1109\/ICPP.2017.20"},{"key":"e_1_2_1_59_1","volume-title":"Proceedings of the IEEE 30th International Parallel and Distributed Processing Symposium. 902\u2013911","author":"Smith Shaden","year":"2016","unstructured":"Shaden Smith and George Karypis . 2016 . A medium-grained algorithm for distributed sparse tensor factorization . In Proceedings of the IEEE 30th International Parallel and Distributed Processing Symposium. 902\u2013911 . DOI:https:\/\/doi.org\/10.1109\/IPDPS.2016.113 Shaden Smith and George Karypis. 2016. A medium-grained algorithm for distributed sparse tensor factorization. In Proceedings of the IEEE 30th International Parallel and Distributed Processing Symposium. 902\u2013911. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2016.113"},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 61\u201370","author":"Smith S.","year":"2015","unstructured":"S. Smith , N. Ravindran , N. D. Sidiropoulos , and G. Karypis . 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication . In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 61\u201370 . DOI:https:\/\/doi.org\/10.1109\/IPDPS. 2015 .27 S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 61\u201370. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2015.27"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.06.002"},{"key":"e_1_2_1_62_1","volume-title":"Proceedings of the International Conference on Collaborative Computing: Networking, Applications and Worksharing. Springer, 189\u2013201","author":"Tang Bing","year":"2018","unstructured":"Bing Tang , Linyao Kang , Yanmin Xia , and Li Zhang . 2018 . GPU-accelerated large-scale non-negative matrix factorization using spark . In Proceedings of the International Conference on Collaborative Computing: Networking, Applications and Worksharing. Springer, 189\u2013201 . https:\/\/doi.org\/10.1007\/978-3-030-12981-1_13 Bing Tang, Linyao Kang, Yanmin Xia, and Li Zhang. 2018. GPU-accelerated large-scale non-negative matrix factorization using spark. In Proceedings of the International Conference on Collaborative Computing: Networking, Applications and Worksharing. Springer, 189\u2013201. https:\/\/doi.org\/10.1007\/978-3-030-12981-1_13"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342005051521"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2004.11.013"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1002\/cem.889"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1137\/17M1152371"},{"key":"e_1_2_1_67_1","unstructured":"Yining Wang Hsiao-Yu Tung Alexander J. Smola and Anima Anandkumar. 2015. Fast and guaranteed tensor decomposition via sketching. In Advances in Neural Information Processing Systems. 991\u2013999. Retrieved from https:\/\/dl.acm.org\/doi\/10.5555\/2969239.2969350.  Yining Wang Hsiao-Yu Tung Alexander J. Smola and Anima Anandkumar. 2015. Fast and guaranteed tensor decomposition via sketching. In Advances in Neural Information Processing Systems. 991\u2013999. Retrieved from https:\/\/dl.acm.org\/doi\/10.5555\/2969239.2969350."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8655(01)00070-8"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3432185","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3432185","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3432185","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:09Z","timestamp":1750193229000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3432185"}},"subtitle":["Parallel Low-rank Approximation with Nonnegativity Constraints"],"short-title":[],"issued":{"date-parts":[[2021,6,26]]},"references-count":65,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3432185"],"URL":"https:\/\/doi.org\/10.1145\/3432185","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,26]]},"assertion":[{"value":"2019-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}