{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:01:23Z","timestamp":1750309283605,"version":"3.41.0"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:00:00Z","timestamp":1750204800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:00:00Z","timestamp":1750204800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["955495"],"award-info":[{"award-number":["955495"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"DOI":"10.1007\/s11227-025-07510-5","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:08:40Z","timestamp":1750248520000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Performance portability of generated cardiac simulation kernels through automatic dimensioning and load balancing on heterogeneous nodes"],"prefix":"10.1007","volume":"81","author":[{"given":"Vincent","family":"Alba","sequence":"first","affiliation":[]},{"given":"Olivier","family":"Aumage","sequence":"additional","affiliation":[]},{"given":"Denis","family":"Barthou","sequence":"additional","affiliation":[]},{"given":"Marie-Christine","family":"Counilh","sequence":"additional","affiliation":[]},{"given":"Amina","family":"Guermouche","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"7510_CR1","unstructured":"Nvidia dgx-1 with tesla v100 system architecture the fastest platform for deep learning. (2018)"},{"key":"7510_CR2","doi-asserted-by":"crossref","unstructured":"Alba V, Aumage O, Barthou D, Colin R, Counilh M-C, Genaud S, Guermouche A, Loechner V, Thangamani A (2024) Performance portability of generated cardiac simulation kernels through automatic dimensioning and load balancing on heterogeneous nodes. In: 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 1006\u20131015","DOI":"10.1109\/IPDPSW63119.2024.00171"},{"issue":"23","key":"7510_CR3","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1002\/cpe.1631","volume":"2009","author":"C Augonnet","year":"2011","unstructured":"Augonnet C, Thibault S, Namyst R, Wacrenier P-A (2011) StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. CCPE - Concur Comput: Pract Exp, Spec Issue: Euro-Par 2009(23):187\u2013198","journal-title":"CCPE - Concur Comput: Pract Exp, Spec Issue: Euro-Par"},{"issue":"10","key":"7510_CR4","doi-asserted-by":"publisher","first-page":"2421","DOI":"10.1109\/TPDS.2020.2989869","volume":"31","author":"A Cabrera","year":"2020","unstructured":"Cabrera A, Acosta A, Almeida F, Blanco V (2020) A dynamic multi-objective approach for dynamic load balancing in heterogeneous systems. IEEE Trans Parallel Distrib Syst 31(10):2421\u20132434","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"7510_CR5","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1016\/j.cam.2015.02.008","volume":"295","author":"J Campos","year":"2016","unstructured":"Campos J, Oliveira R, dos Santos R, Rocha B (2016) Lattice boltzmann method for parallel simulations of cardiac electrophysiology using gpus. J Comput Appl Math 295:70\u201382 (VIII Pan-American Workshop in Applied and Computational Mathematics)","journal-title":"J Comput Appl Math"},{"key":"7510_CR6","doi-asserted-by":"crossref","unstructured":"Chien S, Peng I, Markidis S (2019) Performance evaluation of advanced features in cuda unified memory. In: 2019 IEEE\/ACM Workshop on Memory Centric High Performance Computing (MCHPC), pp 50\u201357","DOI":"10.1109\/MCHPC49590.2019.00014"},{"issue":"1","key":"7510_CR7","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1016\/j.pbiomolbio.2015.12.008","volume":"120","author":"M Clerx","year":"2016","unstructured":"Clerx M, Collins P, de Lange E, Volders PG (2016) Myokit: A simple interface to cardiac cellular electrophysiology. Prog Biophys Mol Biol 120(1):100\u2013114 (Recent Developments in Biophysics and Molecular Biology of Heart Rhythm)","journal-title":"Prog Biophys Mol Biol"},{"issue":"2\u20133","key":"7510_CR8","doi-asserted-by":"publisher","first-page":"20200021","DOI":"10.1515\/jib-2020-0021","volume":"17","author":"M Clerx","year":"2020","unstructured":"Clerx M, Cooling MT, Cooper J, Garny A, Moyle K, Nickerson DP, Nielsen PMF, Sorby H (2020) Cellml 2.0. J Integr Bioinform 17(2\u20133):20200021","journal-title":"J Integr Bioinform"},{"key":"7510_CR9","doi-asserted-by":"crossref","unstructured":"Duch\u00e2teau A, Padua DA, Barthou D (Feb. 2013) Hydra: Automatic algorithm exploration from linear algebra equations. In: Code Generation and Optimization, pages pp 1\u201310, Shenzhen, China","DOI":"10.1109\/CGO.2013.6494999"},{"key":"7510_CR10","doi-asserted-by":"crossref","unstructured":"Faverge M, Furmento N, Guermouche A, Lucas G, Namyst R, Thibault S, Wacrenier P-A (2023) Programming Heterogeneous Architectures Using Hierarchical Tasks. Concurrency and Computation: Practice and Experience 35(25)","DOI":"10.1002\/cpe.7811"},{"key":"7510_CR11","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1016\/j.jpdc.2021.08.001","volume":"158","author":"M Gonz\u00e1lez","year":"2021","unstructured":"Gonz\u00e1lez M, Morancho E (2021) Multi-gpu systems and unified virtual memory for scientific applications: The case of the nas multi-zone parallel benchmarks. J Parallel Distrib Comput 158:138\u2013150","journal-title":"J Parallel Distrib Comput"},{"key":"7510_CR12","doi-asserted-by":"publisher","first-page":"1732","DOI":"10.1007\/s11227-019-02768-y","volume":"75","author":"MAD Guzm\u00e1n","year":"2019","unstructured":"Guzm\u00e1n MAD, Nozal R, Tejero RG, Villarroya-Gaud\u00f3 M, Gracia DS, Bosque JL (2019) Cooperative cpu, gpu, and fpga heterogeneous execution with enginecl. J Supercomput 75:1732\u20131746","journal-title":"J Supercomput"},{"key":"7510_CR13","doi-asserted-by":"crossref","unstructured":"Heldens S, Varbanescu AL, Iosup A (2016) Dynamic load balancing for high-performance graph processing on hybrid cpu-gpu platforms. In: 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3), pp 62\u201365","DOI":"10.1109\/IA3.2016.016"},{"key":"7510_CR14","doi-asserted-by":"crossref","unstructured":"Huchant P, Barthou D, Counilh M-C (Sept. 2018) Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels. In: SBAC-PAD - 30th International Symposium on Computer Architecture and High Performance Computing, Lyon, France","DOI":"10.1109\/CAHPC.2018.8645867"},{"key":"7510_CR15","doi-asserted-by":"crossref","unstructured":"Huchant P, Counilh M-C, Barthou D (Aug.2016) Automatic OpenCL Task Adaptation for Heterogeneous Architectures. Euro-Par. Euro-Par 2016: Parallel Processing. Grenoble, France, pp 684\u2013696","DOI":"10.1007\/978-3-319-43659-3_50"},{"key":"7510_CR16","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1007\/s11227-019-02966-8","volume":"75","author":"M Knap","year":"2019","unstructured":"Knap M, Czarnul P (2019) Performance evaluation of unified memory with prefetching and oversubscription for selected parallel cuda applications on nvidia pascal and volta gpus. J Supercomput 75:11","journal-title":"J Supercomput"},{"key":"7510_CR17","doi-asserted-by":"crossref","unstructured":"Langguth J, Lan Q, Gaur N, Cai X, Wen M, Zhang C-Y (2016) Enabling tissue-scale cardiac simulations using heterogeneous computing on tianhe-2. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp 843\u2013852","DOI":"10.1109\/ICPADS.2016.0114"},{"key":"7510_CR18","doi-asserted-by":"crossref","unstructured":"Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent cpu-gpu collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp 245\u2013255","DOI":"10.1109\/PACT.2013.6618814"},{"issue":"1","key":"7510_CR19","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1109\/TPDS.2019.2928289","volume":"31","author":"A Li","year":"2020","unstructured":"Li A, Song SL, Chen J, Li J, Liu X, Tallent NR, Barker KJ (2020) Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect. IEEE Trans Parallel Distrib Syst 31(1):94\u2013110","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"7510_CR20","unstructured":"London K, Moore S, Mucci P, Seymour K, Luczak R (2001) The papi cross-platform interface to hardware performance counters. In: Department of Defense Users\u2019 Group Conference Proceedings, Biloxi, Mississippi, 2001-06"},{"key":"7510_CR21","doi-asserted-by":"crossref","unstructured":"Nesi LL, Schnorr LM, Legrand A (2022) Multi-phase task-based hpc applications: Quickly learning how to run fast. In: 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 357\u2013367","DOI":"10.1109\/IPDPS53621.2022.00042"},{"key":"7510_CR22","doi-asserted-by":"crossref","unstructured":"Nozal R, Bosque JL, Beivide R (2019) Towards co-execution on commodity heterogeneous systems: Optimizations for time-constrained scenarios. In: 2019 International Conference on High Performance Computing & Simulation (HPCS), pp 628\u2013635","DOI":"10.1109\/HPCS48598.2019.9188188"},{"key":"7510_CR23","doi-asserted-by":"publisher","first-page":"947","DOI":"10.1016\/j.future.2017.08.007","volume":"92","author":"S Pennycook","year":"2019","unstructured":"Pennycook S, Sewall J, Lee V (2019) Implications of a metric for performance portability. Futur Gener Comput Syst 92:947\u2013958","journal-title":"Futur Gener Comput Syst"},{"key":"7510_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.cmpb.2021.106223","volume":"208","author":"G Plank","year":"2021","unstructured":"Plank G, Loewe A, Neic A, Augustin C, Huang Y-L, Gsell MA, Karabelas E, Nothstein M, Prassl AJ, S\u00e1nchez J, Seemann G, Vigmond EJ (2021) The openCARP simulation environment for cardiac electrophysiology. Comput Methods Programs Biomed 208:106223","journal-title":"Comput Methods Programs Biomed"},{"key":"7510_CR25","doi-asserted-by":"crossref","unstructured":"Potse M, Saillard E, Barthou D, Coudi\u00e8re Y (2020) Feasibility of whole-heart electrophysiological models with near-cellular resolution. In: 2020 Computing in Cardiology, pp 1\u20134","DOI":"10.22489\/CinC.2020.126"},{"key":"7510_CR26","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1016\/j.jpdc.2018.11.001","volume":"125","author":"B P\u00e9rez","year":"2019","unstructured":"P\u00e9rez B, Stafford E, Bosque J, Beivide R, Mateo S, Teruel X, Martorell X, Ayguad\u00e9 E (2019) Auto-tuned opencl kernel co-execution in ompss for heterogeneous systems. J Parallel Distrib Comput 125:45\u201357","journal-title":"J Parallel Distrib Comput"},{"key":"7510_CR27","doi-asserted-by":"crossref","unstructured":"Rossignon C, Pascal H, Aumage O, Thibault S (May 2013) A NUMA-aware fine grain parallelization framework for multi-core architecture. In: PDSEC - 14th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing - 2013, Boston, United States","DOI":"10.1109\/IPDPSW.2013.204"},{"key":"7510_CR28","doi-asserted-by":"crossref","unstructured":"Spafford K, Meredith JS, Vetter JS (2011) Quantifying numa and contention effects in multi-gpu systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, New York, NY, USA. Association for Computing Machinery","DOI":"10.1145\/1964179.1964194"},{"key":"7510_CR29","doi-asserted-by":"crossref","unstructured":"Thangamani A, Jost TT, Loechner V, Genaud S, Bramas B (2023) Lifting code generation of cardiac physiology simulation to novel compiler technology. In: Proceedings of the 21st ACM\/IEEE International Symposium on Code Generation and Optimization, CGO 2023, pp 68-80, New York, NY, USA. Association for Computing Machinery","DOI":"10.1145\/3579990.3580008"},{"key":"7510_CR30","doi-asserted-by":"crossref","unstructured":"Trevisan\u00a0Jost T, Thangamani A, Colin R, Loechner V, Genaud S, Bramas B (2023) Gpu code generation of cardiac electrophysiology simulation with mlir. In: Euro-Par 2023: Parallel Processing: 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28 - September 1, 2023, Proceedings, pp 549-563, Berlin, Heidelberg. Springer-Verlag","DOI":"10.1007\/978-3-031-39698-4_37"},{"key":"7510_CR31","doi-asserted-by":"crossref","unstructured":"Turimbetov I, Sasongko MA, Unat D (2024) Gpu-initiated resource allocation for irregular workloads. In: Proceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions, ExHET \u201924, pp 1-8, New York, NY, USA. Association for Computing Machinery","DOI":"10.1145\/3642961.3643799"},{"key":"7510_CR32","unstructured":"Vigmond E (2021) EasyML. https:\/\/opencarp.org\/documentation\/examples\/01_ep_single_cell\/05_easyml"},{"key":"7510_CR33","doi-asserted-by":"crossref","unstructured":"Wu W, Bouteiller A, Bosilca G, Faverge M, Dongarra J (May 2015) Hierarchical DAG Scheduling for Hybrid Distributed Systems. In: IEEE International Parallel & Distributed Processing Symposium (IPDPS 2015), Hyderabad, India","DOI":"10.1109\/IPDPS.2015.56"},{"key":"7510_CR34","unstructured":"Youness H, Osama M, Tarek a (2012) Load balancing on cpu-gpu heterogeneous system. 12"},{"issue":"2","key":"7510_CR35","doi-asserted-by":"publisher","first-page":"360","DOI":"10.4304\/jcp.9.2.360-367","volume":"9","author":"L Zhang","year":"2014","unstructured":"Zhang L, Wang K, Zuo W, Gai C (2014) G-heart: A gpu-based system for electrophysiological simulation and multi-modality cardiac visualization. J Comput 9(2):360\u2013367","journal-title":"J Comput"},{"key":"7510_CR36","doi-asserted-by":"crossref","unstructured":"Zhong Z, Rychkov V, Lastovetsky A (2012) Data partitioning on heterogeneous multicore and multi-gpu systems using functional performance models of data-parallel applications. In: 2012 IEEE International Conference on Cluster Computing, pp 191\u2013199","DOI":"10.1109\/CLUSTER.2012.34"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-025-07510-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-025-07510-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-025-07510-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:12Z","timestamp":1750291392000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-025-07510-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,18]]},"references-count":36,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["7510"],"URL":"https:\/\/doi.org\/10.1007\/s11227-025-07510-5","relation":{},"ISSN":["1573-0484"],"issn-type":[{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,18]]},"assertion":[{"value":"26 May 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 June 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"1047"}}