{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T08:15:45Z","timestamp":1775808945276,"version":"3.50.1"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"13","license":[{"start":{"date-parts":[[2023,4,8]],"date-time":"2023-04-08T00:00:00Z","timestamp":1680912000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,8]],"date-time":"2023-04-08T00:00:00Z","timestamp":1680912000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"German government's aviation research program","award":["FKZ20X1704A"],"award-info":[{"award-number":["FKZ20X1704A"]}]},{"name":"German government's aviation research program","award":["FKZ20X1704A"],"award-info":[{"award-number":["FKZ20X1704A"]}]},{"name":"German government's aviation research program","award":["FKZ20X1704A"],"award-info":[{"award-number":["FKZ20X1704A"]}]},{"name":"German government's aviation research program","award":["FKZ20X1704A"],"award-info":[{"award-number":["FKZ20X1704A"]}]},{"name":"German government's aviation research program","award":["FKZ20X1704A"],"award-info":[{"award-number":["FKZ20X1704A"]}]},{"name":"German government's aviation research program","award":["FKZ20X1704A"],"award-info":[{"award-number":["FKZ20X1704A"]}]},{"name":"German government's aviation research program","award":["FKZ20X1704A"],"award-info":[{"award-number":["FKZ20X1704A"]}]},{"name":"Deutsches Zentrum f\u00fcr Luft- und Raumfahrt e. V. (DLR)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Developments in numerical simulation of flows and high-performance computing influence one another. More detailed simulation methods create a permanent need for more computational power, while new hardware developments often require changes to the software to exploit new hardware features. This dependency is very pronounced in the case of vector-units which are featured by all modern processors to increase their numerical throughput but require vectorization of the software to be used efficiently. We study the vectorization of a simulation method that exhibits an inherent level of vector-parallelism. This is of particular interest as SIMD operations will hopefully be available with <jats:italic>std::simd<\/jats:italic> in a future C++ standard. The simulation method considered here results in the simultaneous solution of multiple sparse linear systems of equations which only differ by their main diagonal and right-hand sides. Such structure arises in the simulation of unsteady flow in turbomachinery by means of a frequency domain approach called harmonic balance.<\/jats:p>","DOI":"10.1007\/s11227-023-05220-4","type":"journal-article","created":{"date-parts":[[2023,4,8]],"date-time":"2023-04-08T07:02:53Z","timestamp":1680937373000},"page":"14684-14706","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["SIMD vectorization for simultaneous solution of locally varying linear systems with multiple right-hand sides"],"prefix":"10.1007","volume":"79","author":[{"given":"Martin J.","family":"K\u00fchn","sequence":"first","affiliation":[]},{"given":"Johannes","family":"Holke","sequence":"additional","affiliation":[]},{"given":"Annette","family":"Lutz","sequence":"additional","affiliation":[]},{"given":"Jonas","family":"Thies","sequence":"additional","affiliation":[]},{"given":"Melven","family":"R\u00f6hrig-Z\u00f6llner","sequence":"additional","affiliation":[]},{"given":"Alexander","family":"Bleh","sequence":"additional","affiliation":[]},{"given":"Jan","family":"Backhaus","sequence":"additional","affiliation":[]},{"given":"Achim","family":"Basermann","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,4,8]]},"reference":[{"issue":"1","key":"5220_CR1","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1146\/annurev-fluid-031221-105530","volume":"54","author":"RD Sandberg","year":"2022","unstructured":"Sandberg RD, Michelassi V (2022) Fluid dynamics of axial turbomachinery: blade- and stage-level simulations and models. Annu Rev Fluid Mech 54(1):255\u2013285. https:\/\/doi.org\/10.1146\/annurev-fluid-031221-105530","journal-title":"Annu Rev Fluid Mech"},{"issue":"5","key":"5220_CR2","doi-asserted-by":"publisher","first-page":"879","DOI":"10.2514\/2.1754","volume":"40","author":"KC Hall","year":"2002","unstructured":"Hall KC, Thomas JP, Clark WS (2002) Computation of unsteady nonlinear flows in cascades using a harmonic balance technique. AIAA J 40(5):879\u2013886. https:\/\/doi.org\/10.2514\/2.1754","journal-title":"AIAA J"},{"key":"5220_CR3","doi-asserted-by":"publisher","unstructured":"Frey C, Ashcroft G, Kersken H-P, Voigt C (2014). A harmonic balance technique for multistage turbomachinery applications. https:\/\/doi.org\/10.1115\/GT2014-25230","DOI":"10.1115\/GT2014-25230"},{"key":"5220_CR4","doi-asserted-by":"crossref","unstructured":"Krzikalla O, Rempke A, Bleh A, Wagner M, Gerhold T (2021) Spliss: a sparse linear system solver for transparent integration of emerging HPC technologies into CFD solvers and applications. In: STAB\/DGLR Symposium 2020: New Results in Numerical and Experimental Fluid Mechanics XIII, pp 635\u2013645","DOI":"10.1007\/978-3-030-79561-0_60"},{"key":"5220_CR5","unstructured":"Kretz M (2015) Extending C++ for explicit data-parallel programming via SIMD vector types. PhD thesis. https:\/\/publikationen.ub.uni-frankfurt.de\/frontdoor\/index\/index\/docId\/38415"},{"key":"5220_CR6","unstructured":"McMullen MS (2003) The application of non-linear frequency domain methods to the Euler and Navier\u2013Stokes equations. PhD thesis, Stanford University"},{"key":"5220_CR7","volume-title":"Mathematical Aspects of Discontinuous Galerkin Methods. Math\u00e9matiques et Applications","author":"DA Di Pietro","year":"2011","unstructured":"Di Pietro DA, Ern A (2011) Mathematical Aspects of Discontinuous Galerkin Methods. Math\u00e9matiques et Applications, vol 69. Springer, Heidelberg"},{"key":"5220_CR8","series-title":"Frontiers in Applied Mathematics","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898717440","volume-title":"Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations: Theory and Implementation","author":"B Rivi\u00e8re","year":"2008","unstructured":"Rivi\u00e8re B (2008) Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations: Theory and Implementation. Frontiers in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia"},{"key":"5220_CR9","doi-asserted-by":"publisher","DOI":"10.1201\/ebk1439811924","volume-title":"Introduction to High Performance Computing for Scientists and Engineers","author":"G Hager","year":"2010","unstructured":"Hager G, Wellein G (2010) Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton. https:\/\/doi.org\/10.1201\/ebk1439811924"},{"key":"5220_CR10","unstructured":"Naishlos D (2004) Autovectorization in GCC. In: Proceedings of the 2004 GCC developers summit, pp 105\u2013118"},{"key":"5220_CR11","doi-asserted-by":"publisher","unstructured":"Bramas B (2017) A novel hybrid quicksort algorithm vectorized using AVX-512 on intel Skylake. Int J Adv Comput Sci Appl 8(10). https:\/\/doi.org\/10.14569\/IJACSA.2017.081044","DOI":"10.14569\/IJACSA.2017.081044"},{"key":"5220_CR12","unstructured":"Watkins JA (2019) A fast and simple approach to merge sorting using AVX-512. Georgia Institute of Technology"},{"key":"5220_CR13","unstructured":"Sansone G, Cococcioni M. Experiments on speeding up the recursive fast Fourier transform by using AVX-512 SIMD instructions. https:\/\/www.researchgate.net\/publication\/364102036_Experiments_on_Speeding_Up_the_Recursive_Fast_Fourier_Transform_by_using_AVX-512_SIMD_instructions"},{"issue":"11","key":"5220_CR14","doi-asserted-by":"publisher","first-page":"2582","DOI":"10.1109\/TPDS.2020.2996314","volume":"31","author":"L Szustak","year":"2020","unstructured":"Szustak L, Wyrzykowski R, Olas T, Mele V (2020) Correlation of performance optimizations and energy consumption for stencil-based application on Intel Xeon scalable processors. IEEE Trans Parallel Distrib Syst 31(11):2582\u20132593. https:\/\/doi.org\/10.1109\/TPDS.2020.2996314","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"5220_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2022.111234","volume":"463","author":"S Long","year":"2022","unstructured":"Long S, Fan X, Li C, Liu Y, Fan S, Guo X-W, Yang C (2022) Vecdualsphysics: a vectorized implementation of smoothed particle hydrodynamics method for simulating fluid flows on multi-core processors. J Comput Phys 463:111234","journal-title":"J Comput Phys"},{"issue":"3","key":"5220_CR16","doi-asserted-by":"publisher","first-page":"1999","DOI":"10.1007\/s11227-019-02839-0","volume":"76","author":"T Jakobs","year":"2020","unstructured":"Jakobs T, Naumann B, R\u00fcnger G (2020) Performance and energy consumption of the SIMD Gram\u2013Schmidt process for vector orthogonalization. J Supercomput 76(3):1999\u20132021","journal-title":"J Supercomput"},{"key":"5220_CR17","doi-asserted-by":"publisher","unstructured":"Cebri\u00e1n JM, Jahre M, Natvig L (2014) Optimized hardware for suboptimal software: the case for SIMD-aware benchmarks. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 66\u201375. https:\/\/doi.org\/10.1109\/ISPASS.2014.6844462","DOI":"10.1109\/ISPASS.2014.6844462"},{"issue":"3","key":"5220_CR18","doi-asserted-by":"publisher","first-page":"2082","DOI":"10.1007\/s11227-019-02840-7","volume":"76","author":"JM Cebrian","year":"2020","unstructured":"Cebrian JM, Natvig L, Jahre M (2020) Scalability analysis of AVX-512 extensions. J Supercomput 76(3):2082\u20132097","journal-title":"J Supercomput"},{"key":"5220_CR19","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4842-7918-2","volume-title":"Modern Parallel Programming with C++ and Assembly","author":"D Kusswurm","year":"2022","unstructured":"Kusswurm D (2022) Modern Parallel Programming with C++ and Assembly. Springer, Geneva"},{"issue":"4","key":"5220_CR20","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1145\/1498765.1498785","volume":"52","author":"S Williams","year":"2009","unstructured":"Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65\u201376. https:\/\/doi.org\/10.1145\/1498765.1498785","journal-title":"Commun ACM"},{"key":"5220_CR21","doi-asserted-by":"publisher","unstructured":"Treibig J, Hager G, Wellein G (2010) LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: 39th International Conference on Parallel Processing Workshops, pp 207\u2013216. https:\/\/doi.org\/10.1109\/icppw.2010.38","DOI":"10.1109\/icppw.2010.38"},{"issue":"1","key":"5220_CR22","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s13272-015-0179-7","volume":"7","author":"N Kroll","year":"2016","unstructured":"Kroll N, Abu-Zurayk M, Dimitrov D, Franz T, F\u00fchrer T, Gerhold T, G\u00f6rtz S, Heinrich R, Ilic C, Jepsen J, J\u00e4gersk\u00fcpper J, Kruse M, Krumbein A, Langer S, Liu D, Liepelt R, Reimer L, Ritter M, Schw\u00f6ppe A, Scherer J, Spiering F, Thormann R, Togiti V, Vollmer D, Wendisch J-H (2016) DLR project Digital-X: towards virtual aircraft design and flight testing based on high-fidelity methods. CEAS Aeronaut J 7(1):3\u201327. https:\/\/doi.org\/10.1007\/s13272-015-0179-7. (Accessed 2022-04-26)","journal-title":"CEAS Aeronaut J"},{"key":"5220_CR23","unstructured":"Message Passing Interface Forum: MPI: A Message-Passing Interface Standard Version 4.0. (2021). https:\/\/www.mpi-forum.org\/docs\/mpi-4.0\/mpi40-report.pdf"},{"key":"5220_CR24","doi-asserted-by":"publisher","unstructured":"Alrutz T, Backhaus J, Brandes T, End V, Gerhold T, Geiger A, Gr\u00fcnewald D, Heuveline V, J\u00e4gersk\u00fcpper J, Kn\u00fcpfer A, Krzikalla O, Kuegeler E, Lojewski C, Lonsdale G, M\u00fcller-Pfefferkorn R, Nagel W, Oden L, Pfreundt F-J, Rahn M, Weiss J-P (2013) GASPI\u2014a partitioned global address space programming interface, pp 135\u2013136. https:\/\/doi.org\/10.1007\/978-3-642-35893-7_18","DOI":"10.1007\/978-3-642-35893-7_18"},{"key":"5220_CR25","doi-asserted-by":"crossref","unstructured":"Matthes A, Widera R, Zenker E, Worpitz B, Huebl A, Bussmann M (2017) Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library. In: ISC High Performance 2017, pp 496\u2013514","DOI":"10.1007\/978-3-319-67630-2_36"},{"key":"5220_CR26","doi-asserted-by":"publisher","unstructured":"Stengel H, Treibig J, Hager G, Wellein G (2015) Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM on International Conference on Supercomputing. https:\/\/doi.org\/10.1145\/2751205.2751240","DOI":"10.1145\/2751205.2751240"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-023-05220-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-023-05220-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-023-05220-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T11:13:28Z","timestamp":1687864408000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-023-05220-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,8]]},"references-count":26,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["5220"],"URL":"https:\/\/doi.org\/10.1007\/s11227-023-05220-4","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,8]]},"assertion":[{"value":"20 March 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 April 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Not applicable","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}