{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T22:55:12Z","timestamp":1777676112526,"version":"3.51.4"},"reference-count":42,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T00:00:00Z","timestamp":1740528000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T00:00:00Z","timestamp":1740528000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"Helmholtz Association of German Research Centres"},{"DOI":"10.13039\/100021130","name":"Bundesministerium f\u00fcr Wirtschaft und Klimaschutz","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100021130","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2025,5]]},"abstract":"<jats:p>\n                    Tensor networks are a class of algorithms aimed at reducing the computational complexity of high-dimensional problems. They are used in an increasing number of applications, from quantum simulations to machine learning. Exploiting data parallelism in these algorithms is key to using modern hardware. However, there are several ways to map required tensor operations onto linear algebra routines (\u201cbuilding blocks\u201d). Optimizing this mapping impacts the numerical behavior, so computational and numerical aspects must be considered hand-in-hand. In this paper we discuss the performance of solvers for low-rank linear systems in the tensor-train format (also known as matrix-product states). We consider three popular algorithms: TT-GMRES, MALS, and AMEn. We illustrate their computational complexity based on the example of discretizing a simple high-dimensional PDE in, for example, 50\n                    <jats:sup>10<\/jats:sup>\n                    grid points. This shows that the projection to smaller sub-problems for MALS and AMEn reduces the number of floating-point operations by orders of magnitude. We suggest optimizations regarding orthogonalization steps, singular value decompositions, and tensor contractions. In addition, we propose a generic preconditioner based on a TT-rank-1 approximation of the linear operator. Overall, we obtain roughly a 5\u00d7 speedup over the reference algorithm for the fastest method (AMEn) on a current multicore CPU.\n                  <\/jats:p>","DOI":"10.1177\/10943420251317994","type":"journal-article","created":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T19:43:06Z","timestamp":1740598986000},"page":"443-461","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["Performance of linear solvers in tensor-train format on current multicore architectures"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9851-5886","authenticated-orcid":false,"given":"Melven","family":"R\u00f6hrig-Z\u00f6llner","sequence":"first","affiliation":[{"name":"German Aerospace Center (DLR)"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manuel","family":"Becklas","sequence":"additional","affiliation":[{"name":"German Aerospace Center (DLR)"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9231-9999","authenticated-orcid":false,"given":"Jonas","family":"Thies","sequence":"additional","affiliation":[{"name":"Delft University of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3637-3231","authenticated-orcid":false,"given":"Achim","family":"Basermann","sequence":"additional","affiliation":[{"name":"German Aerospace Center (DLR)"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2025,2,26]]},"reference":[{"key":"e_1_3_5_2_1","doi-asserted-by":"publisher","DOI":"10.1137\/21m1451191"},{"key":"e_1_3_5_3_1","doi-asserted-by":"publisher","DOI":"10.1002\/nla.1818"},{"key":"e_1_3_5_4_1","doi-asserted-by":"publisher","unstructured":"Carson E Demmel J Grigori L et al. (2016) Write-avoiding algorithms. In: 2016 IEEE international parallel and distributed processing symposium (IPDPS) Chicago IL 23\u201327 May 2016 648\u2013658. IEEE. DOI: 10.1109\/ipdps.2016.114.","DOI":"10.1109\/ipdps.2016.114"},{"key":"e_1_3_5_5_1","unstructured":"Coulaud O Giraud L Iannacito M (2022) A robust gmres algorithm in tensor train format. ArXiv Preprint arXiv:2210.14533."},{"key":"e_1_3_5_6_1","doi-asserted-by":"publisher","DOI":"10.1137\/20m1387158"},{"key":"e_1_3_5_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10208-015-9265-9"},{"key":"e_1_3_5_8_1","doi-asserted-by":"publisher","DOI":"10.1515\/rnam-2013-0009"},{"key":"e_1_3_5_9_1","doi-asserted-by":"publisher","DOI":"10.1137\/18m1198041"},{"key":"e_1_3_5_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/140953289"},{"key":"e_1_3_5_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-005-0615-4"},{"key":"e_1_3_5_12_1","doi-asserted-by":"publisher","DOI":"10.56021\/9781421407944"},{"key":"e_1_3_5_13_1","unstructured":"Guennebaud G Jacob B (2010) Eigen v3. Version 3.3.9. https:\/\/eigen.tuxfamily.org."},{"key":"e_1_3_5_14_1","doi-asserted-by":"publisher","DOI":"10.1201\/ebk1439811924"},{"key":"e_1_3_5_15_1","doi-asserted-by":"publisher","DOI":"10.1093\/oso\/9780198535645.003.0010"},{"key":"e_1_3_5_16_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718027"},{"key":"e_1_3_5_17_1","doi-asserted-by":"publisher","DOI":"10.1137\/100818893"},{"key":"e_1_3_5_18_1","unstructured":"Intel (2023) Intel(R) oneAPI math kernel library (oneMKL). https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/oneapi\/onemkl.html.Version2023.2."},{"key":"e_1_3_5_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13160-021-00459-x"},{"key":"e_1_3_5_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00365-011-9131-1"},{"key":"e_1_3_5_21_1","doi-asserted-by":"publisher","DOI":"10.1137\/07070111x"},{"key":"e_1_3_5_22_1","doi-asserted-by":"publisher","DOI":"10.1137\/100799010"},{"key":"e_1_3_5_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/15m1032909"},{"key":"e_1_3_5_24_1","doi-asserted-by":"publisher","DOI":"10.1002\/nla.1839"},{"key":"e_1_3_5_25_1","doi-asserted-by":"publisher","DOI":"10.1137\/090752286"},{"key":"e_1_3_5_26_1","doi-asserted-by":"publisher","DOI":"10.1137\/110833142"},{"key":"e_1_3_5_27_1","doi-asserted-by":"publisher","DOI":"10.1137\/17m1148712"},{"key":"e_1_3_5_28_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611971163"},{"key":"e_1_3_5_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_3_5_30_1","doi-asserted-by":"publisher","unstructured":"R\u00f6hrig-Z\u00f6llner M Becklas MJ (2024) PITTS - parallel iterative tensor-train solvers. DOI: 10.5281\/zenodo.13762681. URL: https:\/\/github.com\/melven\/pitts.","DOI":"10.5281\/zenodo.13762681"},{"key":"e_1_3_5_31_1","doi-asserted-by":"publisher","DOI":"10.1137\/21m1395545"},{"key":"e_1_3_5_32_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"e_1_3_5_33_1","doi-asserted-by":"publisher","DOI":"10.1103\/revmodphys.77.259"},{"key":"e_1_3_5_34_1","doi-asserted-by":"publisher","DOI":"10.1137\/20m1316639"},{"key":"e_1_3_5_35_1","doi-asserted-by":"publisher","DOI":"10.1137\/s1064827502406415"},{"key":"e_1_3_5_36_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00753"},{"key":"e_1_3_5_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3157733"},{"key":"e_1_3_5_38_1","doi-asserted-by":"publisher","unstructured":"Treibig J Hager G Wellein G (2010) LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: 2010 39th international conference on parallel processing workshops San Diego CA 13\u201316 September 2010. IEEE. DOI: 10.1109\/icppw.2010.38.","DOI":"10.1109\/icppw.2010.38"},{"key":"e_1_3_5_39_1","doi-asserted-by":"publisher","DOI":"10.1137\/s0895479802403459"},{"key":"e_1_3_5_40_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511615115"},{"key":"e_1_3_5_41_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.69.2863"},{"key":"e_1_3_5_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_5_43_1","doi-asserted-by":"publisher","DOI":"10.1103\/RevModPhys.55.583"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251317994","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420251317994","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251317994","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:17:42Z","timestamp":1777450662000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420251317994"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,26]]},"references-count":42,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,5]]}},"alternative-id":["10.1177\/10943420251317994"],"URL":"https:\/\/doi.org\/10.1177\/10943420251317994","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,26]]}}}