{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T10:53:00Z","timestamp":1777632780482,"version":"3.51.4"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,9,10]],"date-time":"2022-09-10T00:00:00Z","timestamp":1662768000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Education, Youth and Sports of the Czech Republic","award":["CZ.02.1.01\/0.0\/0.0\/16_019\/0000765"],"award-info":[{"award-number":["CZ.02.1.01\/0.0\/0.0\/16_019\/0000765"]}]},{"DOI":"10.13039\/501100003243","name":"Ministry of Health of the Czech Republic","doi-asserted-by":"crossref","award":["NV19-08-00071"],"award-info":[{"award-number":["NV19-08-00071"]}],"id":[{"id":"10.13039\/501100003243","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001824","name":"Czech Science Foundation","doi-asserted-by":"crossref","award":["21-09093S"],"award-info":[{"award-number":["21-09093S"]}],"id":[{"id":"10.13039\/501100001824","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Student Grant Agency of the Czech Technical University in Prague","award":["SGS20\/184\/OHK4\/3T\/14"],"award-info":[{"award-number":["SGS20\/184\/OHK4\/3T\/14"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            A general multi-purpose data structure for an efficient representation of\n            <jats:italic>conforming unstructured homogeneous<\/jats:italic>\n            meshes for scientific computations on CPU and GPU-based systems is presented. The data structure is provided as open-source software as part of the TNL library (https:\/\/tnl-project.org\/). The abstract representation supports almost any cell shape and common 2D quadrilateral, 3D hexahedron and arbitrarily dimensional simplex shapes are currently built into the library. The implementation is highly configurable via templates of the C++ language, which allows avoiding the storage of unnecessary dynamic data. The internal memory layout is based on state-of-the-art sparse matrix storage formats, which are optimized for different hardware architectures in order to provide high-performance computations. The proposed data structure is also suitable for meshes decomposed into several subdomains and distributed computing using the Message Passing Interface (MPI). The efficiency of the implemented data structure on CPU and GPU hardware architectures is demonstrated on several benchmark problems and a comparison with another library. Its applicability to advanced numerical methods is demonstrated with an example problem of two-phase flow in porous media using a numerical scheme based on the mixed-hybrid finite element method (MHFEM). We show GPU speed-ups that rise above 20 in 2D and 50 in 3D when compared to sequential CPU computations, and above 2 in 2D and 9 in 3D when compared to 12-threaded CPU computations.\n          <\/jats:p>","DOI":"10.1145\/3536164","type":"journal-article","created":{"date-parts":[[2022,5,20]],"date-time":"2022-05-20T12:25:16Z","timestamp":1653049516000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Configurable Open-source Data Structure for Distributed Conforming Unstructured Homogeneous Meshes with GPU Support"],"prefix":"10.1145","volume":"48","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7941-3913","authenticated-orcid":false,"given":"Jakub","family":"Klinkovsk\u00fd","sequence":"first","affiliation":[{"name":"Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8374-6892","authenticated-orcid":false,"given":"Tom\u00e1\u0161","family":"Oberhuber","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7040-9184","authenticated-orcid":false,"given":"Radek","family":"Fu\u010d\u00edk","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2528-9205","authenticated-orcid":false,"given":"V\u00edt\u011bzslav","family":"\u017dabka","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,9,10]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061708"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"James Ahrens Berk Geveci and Charles Law. 2005. ParaView: An end-user tool for large-data visualization. In Visualization Handbook Charles D. Hansen and Chris R. Johnson (Eds.). Butterworth-Heinemann Burlington Chapter 36 717\u2013731.","DOI":"10.1016\/B978-012387582-2\/50038-1"},{"key":"e_1_3_2_4_2","first-page":"22","volume-title":"Proceedings of the International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems","author":"Balogh G\u00e1bor D.","year":"2017","unstructured":"G\u00e1bor D. Balogh, Istv\u00e1n Z. Reguly, and Gihan R. Mudalige. 2017. Comparison of parallelisation approaches, languages, and compilers for unstructured mesh algorithms on GPUs. In Proceedings of the International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems. Springer, 22\u201343."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1268776.1268779"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-008-0004-9"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-008-0003-x"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.2172\/10176386"},{"key":"e_1_3_2_9_2","volume-title":"Proceedings of the 1st OpenSG Symposium","author":"Botsch Mario","year":"2002","unstructured":"Mario Botsch, Stephan Steinberg, Stephan Bischoff, and Leif Kobbelt. 2002. OpenMesh \u2013 a generic and efficient polygon mesh data structure. In Proceedings of the 1st OpenSG Symposium."},{"key":"e_1_3_2_10_2","first-page":"233","volume-title":"Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures","author":"Bulu\u00e7 Aydin","year":"2009","unstructured":"Aydin Bulu\u00e7, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures. ACM, 233\u2013244."},{"key":"e_1_3_2_11_2","first-page":"357","volume-title":"Proceedings of the High Performance Visualization\u2013Enabling Extreme-Scale Scientific Insight","author":"Childs Hank","year":"2012","unstructured":"Hank Childs, Eric Brugger, Brad Whitlock, Jeremy Meredith, Sean Ahern, David Pugmire, Kathleen Biagas, Mark Miller, Cyrus Harrison, Gunther H. Weber, Hari Krishnan, Thomas Fogal, Allen Sanderson, Christoph Garth, E. Wes Bethel, David Camp, Oliver R\u00fcbel, Marc Durant, Jean M. Favre, and Paul Navr\u00e1til. 2012. VisIt: An end-user tool for visualizing and analyzing very large data. In Proceedings of the High Performance Visualization\u2013Enabling Extreme-Scale Scientific Insight. 357\u2013372."},{"key":"e_1_3_2_12_2","unstructured":"Paolo Cignoni and Fabio Ganovelli. 2017. Visualization and Computer Graphics Library (VCGlib). (2017). Retrieved November 2017 from http:\/\/vcg.isti.cnr.it\/vcglib\/."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.5555\/260627.260647"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/800195.805928"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2014.01.005"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2018.12.004"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.advwatres.2016.02.007"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1002\/nme.2579"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2017.04.001"},{"key":"e_1_3_2_20_2","article-title":"OpenFOAM user guide version 8","author":"Greenshields Christopher J.","year":"2020","unstructured":"Christopher J. Greenshields. 2020. OpenFOAM user guide version 8. The OpenFOAM Foundation.","journal-title":"The OpenFOAM Foundation"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/2814935"},{"key":"e_1_3_2_22_2","unstructured":"Intel. 2017. Xeon Gold 6146 Processor Specifications. (2017). Retrieved February 2021 from https:\/\/en.wikichip.org\/w\/index.php?title=intel\/xeon_gold\/6146&oldid=95146WikiChip Semiconductor & Computer Engineering."},{"key":"e_1_3_2_23_2","first-page":"1","volume-title":"Proceedings of the International Workshop on Coupled Methods in Numerical Dynamics","author":"Jasak Hrvoje","year":"2007","unstructured":"Hrvoje Jasak, Aleksandar Jemcov, and Zeljko Tukovic. 2007. OpenFOAM: A C++ library for complex physics simulations. In Proceedings of the International Workshop on Coupled Methods in Numerical Dynamics. IUC Dubrovnik, Croatia, 1\u201320."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_3_2_25_2","volume-title":"ITPACK 2.0: User\u2019s Guide","author":"Kincaid David R.","year":"1979","unstructured":"David R. Kincaid, John R. Respess, and David M. Young. 1979. ITPACK 2.0: User\u2019s Guide. Technical Report. Center for Numerical Analysis, University of Texas. Retrieved from http:\/\/www.ma.utexas.edu\/CNA\/ITPACK\/."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.5555\/2990018.3226281"},{"key":"e_1_3_2_27_2","unstructured":"Jiri Kraus. 2013. An Introduction to CUDA-Aware MPI. (2013). Retrieved from https:\/\/developer.nvidia.com\/blog\/introduction-cuda-aware-mpi\/. NVIDIA Developer Blog [Online; accessed February 2021]."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2401575"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-30561-0_11"},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"Lo\u00efc Mar\u00e9chal. 2021. The GMlib Library v3.30. (2021). Retrieved from https:\/\/github.com\/LoicMarechal\/GMlib. [Online; accessed February 2021].","DOI":"10.18356\/22202331-2021-3-10"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1029\/WR026i003p00399"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11515-8_10"},{"key":"e_1_3_2_33_2","unstructured":"Francesco Nidito. 2014. Replacing Virtual Methods with Templates. (2014). Retrieved from http:\/\/www.di.unipi.it\/nids\/docs\/templates_vs_inheritance.html. [Online; accessed November 2017]."},{"key":"e_1_3_2_34_2","unstructured":"NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture. (2017). Retrieved February 2021 from http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf."},{"key":"e_1_3_2_35_2","unstructured":"NVIDIA. 2020. CUDA Toolkit Documentation Version 11.2. Nvidia. Retrieved February 2021 from https:\/\/docs.nvidia.com\/cuda\/archive\/11.2.0\/index.html."},{"key":"e_1_3_2_36_2","first-page":"282","volume-title":"Proceedings of the Algoritmy","author":"Oberhuber Tom\u00e1\u0161","year":"2012","unstructured":"Tom\u00e1\u0161 Oberhuber and Martin Heller. 2012. Improved row-grouped CSR format for storing of sparse matrices on GPU. In Proceedings of the Algoritmy. 282\u2013290."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.14311\/AP.2021.61.0122"},{"issue":"4","key":"e_1_3_2_38_2","first-page":"447","article-title":"New row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA","volume":"56","author":"Oberhuber Tom\u00e1\u0161","year":"2011","unstructured":"Tom\u00e1\u0161 Oberhuber, Jan Vacata, and Atsushi Suzuki. 2011. New row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA. Acta Technica CSAV 56, 4 (2011), 447\u2013466.","journal-title":"Acta Technica CSAV"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1029\/2019MS001635"},{"key":"e_1_3_2_40_2","unstructured":"Florian Rudolf Karl Rupp and Josef Weinbub. 2014. ViennaGrid 2.1.0 \u2013 User Manual. (2014). Retrieved February 2021 from http:\/\/viennagrid.sourceforge.net\/viennagrid-manual-current.pdf."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.5555\/829576"},{"key":"e_1_3_2_42_2","unstructured":"Nico Schl\u00f6mer. 2021. Meshio: Input\/output for Many Mesh Formats. (2021). Retrieved from https:\/\/github.com\/nschloe\/meshio. [Online; accessed February 2021]."},{"key":"e_1_3_2_43_2","volume-title":"The Visualization Toolkit: An Object-oriented Approach to 3D Graphics","author":"Schroeder W.","year":"2006","unstructured":"W. Schroeder, K. Martin, and B. Lorensen. 2006. The Visualization Toolkit: An Object-oriented Approach to 3D Graphics. Kitware."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2019.07.011"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.2172\/970174"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13144"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3536164","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3536164","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:54Z","timestamp":1750188654000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3536164"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,10]]},"references-count":45,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3536164"],"URL":"https:\/\/doi.org\/10.1145\/3536164","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,10]]},"assertion":[{"value":"2021-04-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}