{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T10:07:32Z","timestamp":1764842852514,"version":"build-2065373602"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2024,3,19]],"date-time":"2024-03-19T00:00:00Z","timestamp":1710806400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,19]],"date-time":"2024-03-19T00:00:00Z","timestamp":1710806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing"},{"name":"Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing"},{"name":"Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing"},{"DOI":"10.13039\/501100003981","name":"Agenzia Spaziale Italiana","doi-asserted-by":"publisher","award":["2018-24-HH.0","2018-24-HH.0","2018-24-HH.0"],"award-info":[{"award-number":["2018-24-HH.0","2018-24-HH.0","2018-24-HH.0"]}],"id":[{"id":"10.13039\/501100003981","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006692","name":"Universit\u00e0 degli Studi di Torino","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006692","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The computing capacity needed to process the data generated in modern scientific experiments is approaching ExaFLOPs. Currently, achieving such performances is only feasible through GPU-accelerated supercomputers. Different languages were developed to program GPUs at different levels of abstraction. Typically, the more abstract the languages, the more portable they are across different GPUs. However, the less abstract and co-designed with the hardware, the more room for code optimization and, eventually, the more performance. In the HPC context, portability and performance are a fairly traditional dichotomy. The current C++ Parallel Standard Template Library (PSTL) has the potential to go beyond this dichotomy. In this work, we analyze the main performance benefits and limitations of PSTL using as a use-case the Gaia Astrometric Verification Unit-Global Sphere Reconstruction parallel solver developed by the European Space Agency Gaia mission. The code aims to find the astrometric parameters of <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\sim10^8$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mo>\u223c<\/mml:mo>\n                    <mml:msup>\n                      <mml:mn>10<\/mml:mn>\n                      <mml:mn>8<\/mml:mn>\n                    <\/mml:msup>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> stars in the Milky Way by iteratively solving a linear system of equations with the LSQR algorithm, originally GPU-ported with the CUDA language. We show that the performance obtained with the PSTL version, which is intrinsically more portable than CUDA, is comparable to the CUDA one on NVIDIA GPU architecture.<\/jats:p>","DOI":"10.1007\/s11227-024-06011-1","type":"journal-article","created":{"date-parts":[[2024,3,19]],"date-time":"2024-03-19T19:02:47Z","timestamp":1710874967000},"page":"14369-14390","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Toward HPC application portability via C++ PSTL: the Gaia AVU-GSR code assessment"],"prefix":"10.1007","volume":"80","author":[{"given":"Giulio","family":"Malenza","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Valentina","family":"Cesare","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marco","family":"Aldinucci","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ugo","family":"Becciani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alberto","family":"Vecchiato","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,3,19]]},"reference":[{"key":"6011_CR1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.6090425","author":"P Carpenter","year":"1998","unstructured":"Carpenter P, Utz U-H, Narasimhamurthy S, Suarez E (2022) Heterogeneous high performance computing. Zenodo. https:\/\/doi.org\/10.5281\/zenodo.6090425","journal-title":"Zenodo"},{"issue":"1","key":"6011_CR2","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1109\/99.660313","volume":"5","author":"L Dagum","year":"1998","unstructured":"Dagum L, Menon R (1998) Openmp: an industry-standard api for shared-memory programming. IEEE Comput Sci Eng 5(1):46\u201355. https:\/\/doi.org\/10.1109\/99.660313","journal-title":"IEEE Comput Sci Eng"},{"key":"6011_CR3","volume-title":"Parallel programming with OpenACC","author":"R Farber","year":"2016","unstructured":"Farber R (2016) Parallel programming with OpenACC, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco","edition":"1"},{"key":"6011_CR4","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.jpdc.2021.05.017","volume":"157","author":"M Aldinucci","year":"2021","unstructured":"Aldinucci M, Cesare V, Colonnelli I, Martinelli AR, Mittone G, Cantalupo B, Cavazzoni C, Drocco M (2021) Practical parallelization of scientific applications with OpenMP, OpenACC and MPI. J Parallel Distrib Comput 157:13\u201329. https:\/\/doi.org\/10.1016\/j.jpdc.2021.05.017","journal-title":"J Parallel Distrib Comput"},{"key":"6011_CR5","doi-asserted-by":"publisher","unstructured":"Reed DA, Gannon D, Dongarra JJ (2022) Reinventing high performance computing: challenges and opportunities. arXiv:abs\/2203.02544, https:\/\/doi.org\/10.48550\/arXiv.2203.02544","DOI":"10.48550\/arXiv.2203.02544"},{"key":"6011_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2019.102584","author":"V Amaral","year":"2019","unstructured":"Amaral V, Norberto B, Goul\u00e3o M, Aldinucci M, Benkner S, Bracciali A, Carreira P, Celms E, Correia L, Grelck C, Karatza H, Kessler C, Kilpatrick P, Martiniano H, Mavridis I, Pllana S, Resp\u00edcio A, Sim\u00e3o J, Veiga L, Visa A (2019) Programming languages for data-intensive HPC applications: a systematic mapping study. Parallel Comput. https:\/\/doi.org\/10.1016\/j.parco.2019.102584","journal-title":"Parallel Comput"},{"key":"6011_CR7","unstructured":"open-std.org. https:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2013\/n3724.pdf. Accessed 15-01-2024 (2013)"},{"key":"6011_CR8","unstructured":"Group TKSW (2021) SYCL 2020 Specification (revision 4). Rev. 8. https:\/\/registry.khronos.org\/SYCL\/specs\/sycl-2020\/pdf\/sycl-2020.pdf"},{"issue":"12","key":"6011_CR9","doi-asserted-by":"publisher","first-page":"3202","DOI":"10.1016\/j.jpdc.2014.07.003","volume":"74","author":"HC Edwards","year":"2014","unstructured":"Edwards HC, Trott CR, Sunderland D (2014) Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J Parallel Distrib Comput 74(12):3202\u20133216. https:\/\/doi.org\/10.1016\/j.jpdc.2014.07.003","journal-title":"J Parallel Distrib Comput"},{"key":"6011_CR10","doi-asserted-by":"publisher","unstructured":"Aldinucci M, Ruggieri S, Torquati M (2010) Porting decision tree algorithms to multicore using FastFlow. In: Balc\u00e1zar JL, Bonchi F, Gionis A, Sebag M (eds) Proceedings of European Conference in Machine Learning and Knowledge Discovery in Databases (ECML PKDD). LNCS, vol 6321. Springer, Barcelona, pp 7\u201323. https:\/\/doi.org\/10.1007\/978-3-642-15880-3_7","DOI":"10.1007\/978-3-642-15880-3_7"},{"key":"6011_CR11","unstructured":"AMD (2021) AMD HIP Programming Guide. Rev. 1210. https:\/\/raw.githubusercontent.com\/RadeonOpenCompute\/ROCm\/rocm-4.5.2\/AMD_HIP_Programming_Guide.pdf"},{"key":"6011_CR12","doi-asserted-by":"publisher","unstructured":"Latt J, Coreixas C, Marson F, Thyagarajan K, Santana\u00a0Neto JP, S S, Brito G (2021) Porting a scientific application to GPU using C++ standard parallelism. https:\/\/doi.org\/10.13140\/RG.2.2.27117.92647","DOI":"10.13140\/RG.2.2.27117.92647"},{"key":"6011_CR13","doi-asserted-by":"publisher","unstructured":"Gomez U, Brito Gadeschi G, Weinzierl T (2023) GPU offloading in ExaHyPE through C++ standard algorithms, pp 2302\u201309005 https:\/\/doi.org\/10.48550\/arXiv.2302.09005, arXiv:2302.09005 [cs.MS]","DOI":"10.48550\/arXiv.2302.09005"},{"key":"6011_CR14","doi-asserted-by":"publisher","unstructured":"Lin W-C, Deakin T, McIntosh-Smith S (2022) Evaluating iso c++ parallel algorithms on heterogeneous hpc systems. In: 2022 IEEE\/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp 36\u201347. https:\/\/doi.org\/10.1109\/PMBS56514.2022.00009","DOI":"10.1109\/PMBS56514.2022.00009"},{"key":"6011_CR15","doi-asserted-by":"publisher","unstructured":"Becciani U, Sciacca E, Bandieramonte M, Vecchiato A, Bucciarelli B, Lattanzi MG (2014) Solving a very large-scale sparse linear system with a parallel algorithm in the gaia mission. In: 2014 International Conference on High Performance Computing Simulation (HPCS), pp 104\u2013111. https:\/\/doi.org\/10.1109\/HPCSim.2014.6903675","DOI":"10.1109\/HPCSim.2014.6903675"},{"issue":"1","key":"6011_CR16","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1145\/355984.355989","volume":"8","author":"CC Paige","year":"1982","unstructured":"Paige CC, Saunders MA (1982) Lsqr: an algorithm for sparse linear equations and sparse least squares. ACM Trans Math Softw (TOMS) 8(1):43\u201371. https:\/\/doi.org\/10.1145\/355984.355989","journal-title":"ACM Trans Math Softw (TOMS)"},{"issue":"2","key":"6011_CR17","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1145\/355993.356000","volume":"8","author":"CC Paige","year":"1982","unstructured":"Paige CC, Saunders MA (1982) Algorithm 583: Lsqr: sparse linear equations and least squares problems. ACM Trans Math Softw (TOMS) 8(2):195\u2013209. https:\/\/doi.org\/10.1145\/355993.356000","journal-title":"ACM Trans Math Softw (TOMS)"},{"key":"6011_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.ascom.2022.100660","volume":"41","author":"V Cesare","year":"2022","unstructured":"Cesare V, Becciani U, Vecchiato A, Lattanzi MG, Pitari F, Raciti M, Tudisco G, Aldinucci M, Bucciarelli B (2022) The Gaia AVU-GSR parallel solver: preliminary studies of a LSQR-based application in perspective of exascale systems. Astron Comput 41:100660. https:\/\/doi.org\/10.1016\/j.ascom.2022.100660. arXiv:2212.11675 [astro-ph.IM]","journal-title":"Astron Comput"},{"key":"6011_CR19","unstructured":"Malenza G, et al (2022) Analysis of openfoam performance obtained using modern c++ parallelization techniques. https:\/\/hdl.handle.net\/20.500.11767\/130796"},{"key":"6011_CR20","doi-asserted-by":"publisher","unstructured":"Asahi Y, Padioleau T, Latu G, Bigot J, Grandgirard V, Obrejan K (2022) Performance portable vlasov code with c++ parallel algorithm. In: 2022 IEEE\/ACM international workshop on performance, portability and productivity in HPC (P3HPC), pp 68\u201380. https:\/\/doi.org\/10.1109\/P3HPC56579.2022.00012","DOI":"10.1109\/P3HPC56579.2022.00012"},{"key":"6011_CR21","doi-asserted-by":"publisher","unstructured":"Bhattacharya M, Calafiura P, Childers T, Dewing M, Dong Z, Gutsche O, Habib S, Ju X, Kirby M, Knoepfel K, Kortelainen M, Kwok M, Leggett C, Lin M, Pascuzzi VR, Strelchenko A, Viren B, Yeo B, Yu H (2022) Portability: a necessary approach for future scientific software. https:\/\/doi.org\/10.48550\/arXiv.2203.09945, arXiv:2203.09945 [physics.comp-ph]","DOI":"10.48550\/arXiv.2203.09945"},{"key":"6011_CR22","doi-asserted-by":"publisher","unstructured":"Atif M, Battacharya M, Calafiura P, Childers T, Dewing M, Dong Z, Gutsche O, Habib S, Knoepfel K, Kortelainen M, Kwok KHM, Leggett C, Lin M, Pascuzzi V, Strelchenko A, Tsulaia V, Viren B, Wang T, Yeo B, Yu H (2023) Evaluating portable parallelization strategies for heterogeneous architectures in high energy physics. https:\/\/doi.org\/10.48550\/arXiv.2306.15869, arXiv:2306.15869 [hep-ex]","DOI":"10.48550\/arXiv.2306.15869"},{"key":"6011_CR23","doi-asserted-by":"publisher","unstructured":"Kang S, Hastings C, Eaton J, Rees B (2023) cugraph c++ primitives: vertex\/edge-centric building blocks for parallel graph computing. In: 2023 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 226\u2013229 . https:\/\/doi.org\/10.1109\/IPDPSW59300.2023.00045","DOI":"10.1109\/IPDPSW59300.2023.00045"},{"key":"6011_CR24","doi-asserted-by":"publisher","unstructured":"Gaia Collaboration, Vallenari A, Brown AGA, Prusti T, et al (2023) Gaia Data Release 3. Summary of the content and survey properties. Astron Astrophys 674, 1 https:\/\/doi.org\/10.1051\/0004-6361\/202243940, arXiv:2208.00211 [astro-ph.GA]","DOI":"10.1051\/0004-6361\/202243940"},{"key":"6011_CR25","doi-asserted-by":"publisher","unstructured":"Vecchiato A, Bucciarelli B, Lattanzi MG, Becciani U, Bianchi L, Abbas U, Sciacca E, Messineo R, De March R (2018) The global sphere reconstruction (GSR). Demonstrating an independent implementation of the astrometric core solution for Gaia. Astron Astrophys 620:40. https:\/\/doi.org\/10.1051\/0004-6361\/201833254, arXiv:1809.05145 [astro-ph.IM]","DOI":"10.1051\/0004-6361\/201833254"},{"issue":"1049","key":"6011_CR26","doi-asserted-by":"publisher","first-page":"074504","DOI":"10.1088\/1538-3873\/acdf1e","volume":"135","author":"V Cesare","year":"2023","unstructured":"Cesare V, Becciani U, Vecchiato A, Lattanzi MG, Pitari F, Aldinucci M, Bucciarelli B (2023) The MPI + CUDA Gaia AVU-GSR parallel solver toward next-generation Exascale infrastructures. Publ Astron Soc Pac 135(1049):074504. https:\/\/doi.org\/10.1088\/1538-3873\/acdf1e. arXiv:2308.00778 [astro-ph.IM]","journal-title":"Publ Astron Soc Pac"},{"key":"6011_CR27","unstructured":"Cesare V, Becciani U, Vecchiato A, Lattanzi MG, Pitari F, Raciti M, Tudisco G, Aldinucci M, Bucciarelli B (2021) Gaia AVU-GSR parallel solver towards exascale infrastructure. In: Astronomical Data Analysis Software and Systems XXXI, Astronomical Society of the Pacific Conference Series. Astronomical Society of the Pacific Conference Series, vol 527, p 457 (in Press)"},{"key":"6011_CR28","doi-asserted-by":"publisher","unstructured":"Cesare V, Becciani U, Vecchiato A, Pitari F, Raciti M, Tudisco G (2022) The Gaia AVU-GSR parallel solver: preliminary porting with OpenACC parallelization language of a LSQR-based application in perspective of exascale systems. INAF Technical Reports 163. https:\/\/doi.org\/10.20371\/INAF\/TechRep\/163","DOI":"10.20371\/INAF\/TechRep\/163"},{"key":"6011_CR29","doi-asserted-by":"publisher","unstructured":"Cesare V, Becciani U, Vecchiato A (2022) The MPI+CUDA Gaia AVU-GSR parallel solver in perspective of next-generation Exascale infrastructures and new green computing milestones. INAF Technical Reports 164. https:\/\/doi.org\/10.20371\/INAF\/TechRep\/164","DOI":"10.20371\/INAF\/TechRep\/164"},{"key":"6011_CR30","doi-asserted-by":"publisher","unstructured":"Aldinucci M, Rabellino S, Pironti M, Spiga F, Viviani P, Drocco M, Guerzoni M, Boella G, Mellia M, Margara P, Drago I, Marturano R, Marchetto G, Piccolo E, Bagnasco S, Lusso S, Vallero S, Attardi G, Barchiesi A, Colla A, Galeazzi F (2018) HPC4AI, an AI-on-demand federated platform endeavour. In: ACM computing frontiers, Ischia, Italy. https:\/\/doi.org\/10.1145\/3203217.3205340","DOI":"10.1145\/3203217.3205340"},{"key":"6011_CR31","doi-asserted-by":"publisher","first-page":"3385","DOI":"10.1109\/ICASSP.2017.7952784","volume-title":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"S Naghibzadeh","year":"2017","unstructured":"Naghibzadeh S, van der Veen A-J (2017) Radioastronomical least squares image reconstruction with iteration regularized krylov subspaces and beamforming-based prior conditioning. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 3385\u20133389.https:\/\/doi.org\/10.1109\/ICASSP.2017.7952784"},{"issue":"12","key":"6011_CR32","doi-asserted-by":"publisher","first-page":"4389","DOI":"10.1007\/s00024-018-1909-7","volume":"175","author":"F Joulidehsar","year":"2018","unstructured":"Joulidehsar F, Moradzadeh A, Doulati Ardejani F (2018) An improved 3d joint inversion method of potential field data using cross-gradient constraint and lsqr method. Pure Appl Geophys 175(12):4389\u20134409. https:\/\/doi.org\/10.1007\/s00024-018-1909-7","journal-title":"Pure Appl Geophys"},{"issue":"4","key":"6011_CR33","doi-asserted-by":"publisher","first-page":"1475","DOI":"10.6038\/pg2019CC0275","volume":"34","author":"S-X Liang","year":"2019","unstructured":"Liang S-X, Jiao Y-J, Fan W-X, Yang B-Z (2019) 3d inversion of magnetic data based on lsqr method and correlation coefficient self constrained. Prog Geophys 34(4):1475\u20131480. https:\/\/doi.org\/10.6038\/pg2019CC0275","journal-title":"Prog Geophys"},{"issue":"2","key":"6011_CR34","doi-asserted-by":"publisher","first-page":"359","DOI":"10.11720\/wtyht.2019.1261","volume":"43","author":"S-X Liang","year":"2019","unstructured":"Liang S-X, Wang Q, Jiao Y-J, Liao G-Z, Jing G (2019) Lsqr-analysis and evaluation of the potential field inversion using lsqr method. Geophys Geochem Explor 43(2):359\u2013366. https:\/\/doi.org\/10.11720\/wtyht.2019.1261","journal-title":"Geophys Geochem Explor"},{"key":"6011_CR35","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1016\/j.jelectrocard.2020.08.017","volume":"62","author":"G Bin","year":"2020","unstructured":"Bin G, Wu S, Shao M, Zhou Z, Bin G (2020) Irn-mlsqr: an improved iterative reweight norm approach to the inverse problem of electrocardiography incorporating factorization-free preconditioned lsqr. J Electrocardiol 62:190\u2013199. https:\/\/doi.org\/10.1016\/j.jelectrocard.2020.08.017","journal-title":"J Electrocardiol"},{"key":"6011_CR36","doi-asserted-by":"publisher","unstructured":"Jaffri NR, Shi L, Abrar U, Ahmad A, Yang J (2020) Electrical resistance tomographic image enhancement using mrnsd and lsqr. In: Proceedings of the 2020 5th International Conference on Multimedia Systems and Signal Processing, pp 16\u201320. https:\/\/doi.org\/10.1145\/3404716.3404722","DOI":"10.1145\/3404716.3404722"},{"key":"6011_CR37","doi-asserted-by":"publisher","first-page":"202100089","DOI":"10.1002\/jbio.202100089","volume":"14","author":"H Guo","year":"2021","unstructured":"Guo H, Zhao H, Yu J, He X, He X, Song X (2021) X-ray luminescence computed tomography using a hybrid proton propagation model and lasso-lsqr algorithm. J Biophotonics 14:202100089. https:\/\/doi.org\/10.1002\/jbio.202100089","journal-title":"J Biophotonics"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06011-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-024-06011-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06011-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,10]],"date-time":"2024-06-10T11:04:24Z","timestamp":1718017464000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-024-06011-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,19]]},"references-count":37,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["6011"],"URL":"https:\/\/doi.org\/10.1007\/s11227-024-06011-1","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"type":"print","value":"0920-8542"},{"type":"electronic","value":"1573-0484"}],"subject":[],"published":{"date-parts":[[2024,3,19]]},"assertion":[{"value":"19 February 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 March 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}