{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T08:14:20Z","timestamp":1769760860209,"version":"3.49.0"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,9,10]],"date-time":"2022-09-10T00:00:00Z","timestamp":1662768000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["GRK2379"],"award-info":[{"award-number":["GRK2379"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>Tensor decompositions, such as CANDECOMP\/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this article, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time, the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.<\/jats:p>","DOI":"10.1145\/3519383","type":"journal-article","created":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T11:43:53Z","timestamp":1651232633000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Algorithm\u00a01026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions"],"prefix":"10.1145","volume":"48","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6057-7491","authenticated-orcid":false,"given":"Christos","family":"Psarras","sequence":"first","affiliation":[{"name":"RWTH Aachen University, Aachen, North Rhine-Westphalia, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4675-7434","authenticated-orcid":false,"given":"Lars","family":"Karlsson","sequence":"additional","affiliation":[{"name":"Ume\u00e5 Universitet, MIT-huset, Ume\u00e5, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7641-4854","authenticated-orcid":false,"given":"Rasmus","family":"Bro","sequence":"additional","affiliation":[{"name":"University of Copenhagen, Rolighedsvej, Copenhagen, Frederiksberg C, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4972-7097","authenticated-orcid":false,"given":"Paolo","family":"Bientinesi","sequence":"additional","affiliation":[{"name":"Ume\u00e5 Universitet, MIT-huset, Ume\u00e5, Sweden"}]}],"member":"320","published-online":{"date-parts":[[2022,9,10]]},"reference":[{"key":"e_1_3_3_2_2","volume-title":"Public Data Sets for Multivariate Data Analysis","unstructured":"[n.d.]. Public Data Sets for Multivariate Data Analysis. Department of Food Science, University of Copenhagen. Retrieved from http:\/\/www.models.life.ku.dk\/datasets."},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1002\/cem.1335\""},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1002\/cem.790\""},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1186785.1186794"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1137\/060676489\""},{"key":"e_1_3_3_7_2","unstructured":"Brett W. Bader Tamara G. Kolda et\u00a0al. 2019. MATLAB Tensor Toolbox Version 3.1. Retrieved from https:\/\/www.tensortoolbox.org."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1137\/17M1112303\""},{"key":"e_1_3_3_9_2","volume-title":"Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications.","author":"Bro Rasmus","year":"1998","unstructured":"Rasmus Bro. 1998. Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications.Ph.D. Dissertation. University of Amsterdam."},{"key":"e_1_3_3_10_2","article-title":"The N-way Toolbox","author":"Bro Rasmus","year":"2020","unstructured":"Rasmus Bro. 2020. The N-way Toolbox. MATLAB Central File Exchange. Retrieved from https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/1088-the-n-way-toolbox.","journal-title":"MATLAB Central File Exchange"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7439(98)00011-2\""},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1099-128X(199709\/10)11:5<393::AID-CEM483>3.0.CO;2-L\""},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.chemolab.2004.04.014\""},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02310791\""},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2017.2777393\""},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuroimage.2007.04.041\""},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3432185"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.23919\/EUSIPCO.2018.8553084\""},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v057.i07\""},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1080\/10556788.2015.1009977\""},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.08.055\""},{"key":"e_1_3_3_22_2","first-page":"1","article-title":"Foundations of the PARAFAC procedure: Models and conditions for an explanatory multi-modal factor analysis.","volume":"16","author":"Harshman R. A.","year":"1970","unstructured":"R. A. Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an explanatory multi-modal factor analysis. UCLA Work. Pap. Phonet. 16 (1970), 1\u201384.","journal-title":"UCLA Work. Pap. Phonet."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178522"},{"key":"e_1_3_3_24_2","article-title":"Intel\u00ae Math Kernel Library documentation","author":"Corporation Intel","year":"2020","unstructured":"Intel Corporation. 2020. Intel\u00ae Math Kernel Library documentation. Retrieved from https:\/\/software.intel.com\/en-us\/mkl-reference-manual-for-c.","journal-title":"Retrieved from https:\/\/software.intel.com\/en-us\/mkl-reference-manual-for-c"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3016078.2851152"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCS.2018.00076\""},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X\""},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.5555\/3322706.3322732"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11306-011-0310-7\""},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.80\""},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/SPAWC.2018.8445941\""},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1002\/nla.2431\""},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1039\/C3AY41160E\""},{"key":"e_1_3_3_34_2","unstructured":"NVIDIA P\u00e9ter Vingelmann and Frank H. P. Fitzek. 2020. CUDA release: 11.0. Retrieved from https:\/\/developer.nvidia.com\/cuda-toolkit."},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7439(97)00031-2\""},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/2729980"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-17248-4_10\""},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2013.2269903\""},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1137\/18M1210691\""},{"key":"e_1_3_3_40_2","unstructured":"Christos Psarras Lars Karlsson Jiajia Li and Paolo Bientinesi. 2021. The landscape of software for tensor computations. Retrieved from arxiv:2103.13756 [cs.MS]."},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1137\/06065577\""},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1021\/ac00293a054\""},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053849\""},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2017.2690524\""},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.27\""},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.112\""},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1137\/110830034\""},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3104988"},{"key":"e_1_3_3_49_2","unstructured":"The MathWorks Inc.2020. Matlab. Retrieved from http:\/\/www.mathworks.com\/."},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.chemolab.2004.07.003\""},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/2764454"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2015.2503260\""},{"key":"e_1_3_3_53_2","unstructured":"N. Vervliet O. Debals L. Sorber M. Van Barel and L. De Lathauwer. 2016. Tensorlab 3.0. Retrieved from https:\/\/www.tensorlab.net."},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2012.97\""},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2019.2936486\""}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3519383","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3519383","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:05Z","timestamp":1750186805000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3519383"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,10]]},"references-count":55,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3519383"],"URL":"https:\/\/doi.org\/10.1145\/3519383","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,10]]},"assertion":[{"value":"2020-10-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-02-16","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}