{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T16:16:06Z","timestamp":1764346566463,"version":"3.46.0"},"reference-count":79,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T00:00:00Z","timestamp":1763769600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T00:00:00Z","timestamp":1763769600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003246","name":"Nederlandse Organisatie voor Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","award":["17249","639.023.710","17249","17249"],"award-info":[{"award-number":["17249","639.023.710","17249","17249"]}],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Form Methods Syst Des"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>GPU programs are widely used in industry. To obtain the best performance, a typical development process involves the manual or semi-automatic application of optimizations prior to compiling the code. Such optimizations can introduce errors. To avoid the introduction of errors, we can augment GPU programs with (pre- and postcondition-style) annotations to capture functional properties. However, keeping these annotations correct when optimizing GPU programs is labor-intensive and error-prone.<\/jats:p>\n                  <jats:p>\n                    This paper presents an approach to automatically apply optimizations to GPU programs while preserving provability by defining\n                    <jats:italic>annotation-aware transformations<\/jats:italic>\n                    . It applies frequently-used GPU optimizations, but besides transforming code, it also transforms the annotations. The approach has been implemented in the\n                    <jats:sc>Alpinist<\/jats:sc>\n                    tool and we evaluate\n                    <jats:sc>Alpinist<\/jats:sc>\n                    in combination with the VerCors program verifier, to automatically apply optimizations to a collection of verified programs and reverify them.\n                  <\/jats:p>","DOI":"10.1007\/s10703-025-00480-7","type":"journal-article","created":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T05:32:47Z","timestamp":1763789567000},"page":"316-372","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Preserving provability over GPU program optimizations with annotation-aware transformations"],"prefix":"10.1007","volume":"67","author":[{"given":"\u00d6mer","family":"\u015eakar","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohsen","family":"Safari","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marieke","family":"Huisman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anton","family":"Wijs","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,11,22]]},"reference":[{"issue":"4","key":"480_CR1","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1145\/29873.29875","volume":"9","author":"R Allen","year":"1987","unstructured":"Allen R, Kennedy K (1987) Automatic translation of Fortran programs to vector form. ACM Trans Program Lang Syst (TOPLAS) 9(4):491\u2013542","journal-title":"ACM Trans Program Lang Syst (TOPLAS)"},{"key":"480_CR2","unstructured":"Amighi A (2018) Specification and verification of synchronisation classes in java: A practical approach. PhD thesis, University of Twente"},{"issue":"8","key":"480_CR3","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1145\/2858788.2688521","volume":"50","author":"A Ashari","year":"2015","unstructured":"Ashari A, Tatikonda S, Boehm M et al (2015) On optimizing machine learning workloads via kernel fusion. ACM SIGPLAN Notices 50(8):173\u2013182","journal-title":"ACM SIGPLAN Notices"},{"key":"480_CR4","doi-asserted-by":"crossref","unstructured":"Ashouri A, Killian W, Cavazos J et al (2018) A survey on compiler autotuning using machine learning. ACM Computing Surveys 51(5): 96:1\u201396:42","DOI":"10.1145\/3197978"},{"key":"480_CR5","doi-asserted-by":"crossref","unstructured":"Ayers G, Litz H, Kozyrakis C, et\u00a0al (2020) Classifying memory access patterns for prefetching. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp 513\u2013526","DOI":"10.1145\/3373376.3378498"},{"key":"480_CR6","doi-asserted-by":"crossref","unstructured":"Bardsley E, Donaldson A (2014) Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In: NASA Formal Methods, LNCS, vol 8430. Springer, pp 230\u2013245","DOI":"10.1007\/978-3-319-06200-6_18"},{"key":"480_CR7","unstructured":"Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. Tech. rep., Citeseer"},{"key":"480_CR8","doi-asserted-by":"crossref","unstructured":"Berdine J, Calcagno C, O\u2019Hearn P (2005) Smallfoot: Modular Automatic Assertion Checking with Separation Logic. In: de\u00a0Boer F, Bonsangue M, Graf S, et\u00a0al (eds) FMCO, LNCS, vol 4111. Springer, Berlin, Heidelberg, pp 115\u2013137","DOI":"10.1007\/11804192_6"},{"key":"480_CR9","doi-asserted-by":"publisher","unstructured":"Bertolli C, Betts A, Mudalige G, et\u00a0al (2011) Design and performance of the OP2 library for unstructured mesh applications. In: Proceedings of the 1st Workshop on Grids, Clouds and P2P Programming (CGWS), Lecture Notes in Computer Science, vol 7155. Springer, Berlin, Heidelberg, pp 191\u2013200, https:\/\/doi.org\/10.1007\/978-3-642-29737-3_22","DOI":"10.1007\/978-3-642-29737-3_22"},{"key":"480_CR10","doi-asserted-by":"crossref","unstructured":"Betts A, Chong N, Donaldson A et al (2012) GPUVerify: a verifier for GPU kernels. OOPSLA. ACM, New York, USA, pp 113\u2013132","DOI":"10.1145\/2384616.2384625"},{"key":"480_CR11","doi-asserted-by":"publisher","first-page":"376","DOI":"10.1016\/j.scico.2014.03.013","volume":"95","author":"S Blom","year":"2014","unstructured":"Blom S, Huisman M, Mihel\u010di\u0107 M (2014) Specification and verification of GPGPU programs. Sci Comput Program 95:376\u2013388","journal-title":"Sci Comput Program"},{"key":"480_CR12","doi-asserted-by":"crossref","unstructured":"Bornat R, Calcagno C, O\u2019Hearn P, et\u00a0al (2005) Permission accounting in separation logic. In: Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL), pp 259\u2013270","DOI":"10.1145\/1040305.1040327"},{"key":"480_CR13","doi-asserted-by":"crossref","unstructured":"Boyland J (2003) Checking Interference with Fractional Permissions. SAS, LNCS, vol 2694. Springer, Berlin, Heidelberg, pp 55\u201372","DOI":"10.1007\/3-540-44898-5_4"},{"issue":"8","key":"480_CR14","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1145\/2692916.2555253","volume":"49","author":"B Catanzaro","year":"2014","unstructured":"Catanzaro B, Keller A, Garland M (2014) A decomposition for in-place matrix transposition. ACM SIGPLAN Notices 49(8):193\u2013206","journal-title":"ACM SIGPLAN Notices"},{"key":"480_CR15","doi-asserted-by":"crossref","unstructured":"Collingbourne P, Cadar C, Kelly PH (2011) Symbolic testing of OpenCL code. In: Haifa Verification Conference, Springer, pp 203\u2013218","DOI":"10.1007\/978-3-642-34188-5_18"},{"key":"480_CR16","unstructured":"Cowan M, Maleki S, Musuvathi M, et al (2022) Gc3: An optimizing compiler for GPU collective communication. arXiv preprint, https:\/\/www.microsoft.com\/en-us\/research\/publication\/gc3-an-optimizingcompiler-for-gpu-collective-communication\/"},{"key":"480_CR17","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1007\/s10009-020-00576-x","volume":"22","author":"R DeFrancisco","year":"2020","unstructured":"DeFrancisco R, Cho S, Ferdman M et al (2020) Swarm model checking on the GPU. International Journal on Software Tools for Technology Transfer 22:583\u2013599. https:\/\/doi.org\/10.1007\/s10009-020-00576-x","journal-title":"International Journal on Software Tools for Technology Transfer"},{"key":"480_CR18","doi-asserted-by":"crossref","unstructured":"Dross C, Furia CA, Huisman M, et\u00a0al (2021) Verifythis 2019: a program verification competition. International Journal on Software Tools for Technology Transfer pp 1\u201311","DOI":"10.1007\/s10009-021-00619-x"},{"issue":"10","key":"480_CR19","doi-asserted-by":"publisher","first-page":"3934","DOI":"10.1007\/s11227-015-1483-z","volume":"71","author":"J Filipovi\u010d","year":"2015","unstructured":"Filipovi\u010d J, Madzin M, Fousek J et al (2015) Optimizing CUDA code by kernel fusion: application on BLAS. The Journal of Supercomputing 71(10):3934\u20133957","journal-title":"The Journal of Supercomputing"},{"key":"480_CR20","doi-asserted-by":"crossref","unstructured":"Gjomemo R, Namjoshi KS, Phung PH, et\u00a0al (2015) From verification to optimizations. In: International Workshop on Verification, Model Checking, and Abstract Interpretation, Springer, pp 300\u2013317","DOI":"10.1007\/978-3-662-46081-8_17"},{"key":"480_CR21","doi-asserted-by":"publisher","unstructured":"Grauer-Gray S, Xu L, Searles R, et\u00a0al (2012) Auto-tuning a High-Level Language Targeted to GPU Codes. In: Proc. 2012 Innovative Parallel Computing (InPar), IEEE, pp 1\u201310, https:\/\/doi.org\/10.1109\/InPar.2012.6339595","DOI":"10.1109\/InPar.2012.6339595"},{"key":"480_CR22","doi-asserted-by":"publisher","unstructured":"van\u00a0den Haak LB, Wijs A, van\u00a0den Brand M, et\u00a0al (2020) Formal methods for GPGPU programming: is the demand met? In: Proceedings of the 16th International Conference on Integrated Formal Methods (IFM 2020), Lecture Notes in Computer Science, vol 12546. Springer, Berlin, Heidelberg, pp 160\u2013177, https:\/\/doi.org\/10.1007\/978-3-030-63461-2_9","DOI":"10.1007\/978-3-030-63461-2_9"},{"key":"480_CR23","doi-asserted-by":"crossref","unstructured":"Hamers R, Jongmans SS (2020) Safe sessions of channel actions in Clojure: a tour of the discourje project. In: International Symposium on Leveraging Applications of Formal Methods, Springer, pp 489\u2013508","DOI":"10.1007\/978-3-030-61362-4_28"},{"key":"480_CR24","doi-asserted-by":"crossref","unstructured":"He M, Vafeiadis V, Qin S, et\u00a0al (2016) Reasoning about fences and relaxed atomics. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), IEEE, pp 520\u2013527","DOI":"10.1109\/PDP.2016.103"},{"key":"480_CR25","doi-asserted-by":"crossref","unstructured":"Herrmann F, Silberholz J, Tiglio M (2011) Black Hole Simulations with CUDA. In: GPU Computing Gems Emerald Edition. Morgan Kaufmann, Burlington, Massachusetts, chap\u00a08, p 103\u2013111","DOI":"10.1016\/B978-0-12-384988-5.00008-5"},{"key":"480_CR26","doi-asserted-by":"publisher","unstructured":"Hijma P, Heldens S, Sclocco A, et al (2023) Optimization techniques for GPU programming. ACM Computing Surveys 55(11). https:\/\/doi.org\/10.1145\/3570638","DOI":"10.1145\/3570638"},{"key":"480_CR27","doi-asserted-by":"crossref","unstructured":"Hong C, Sukumaran-Rajam A, Nisa I, et\u00a0al (2019) Adaptive sparse tiling for sparse matrix multiplication. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, pp 300\u2013314","DOI":"10.1145\/3293883.3295712"},{"key":"480_CR28","unstructured":"Huisman M, Joosten S (2019) A solution to VerifyThis 2019 challenge 1. https:\/\/github.com\/utwente-fmt\/vercors\/blob\/97c49d6dc1097ded47a5ed53143695ace6904865\/examples\/verifythis\/2019\/challenge1.pvl"},{"key":"480_CR29","doi-asserted-by":"crossref","unstructured":"Huisman M, Blom S, Darabi S, et\u00a0al (2018) Program correctness by transformation. In: 8th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation (ISoLA), LNCS, vol 11244. Springer, Berlin, Heidelberg","DOI":"10.1007\/978-3-030-03418-4_22"},{"key":"480_CR30","doi-asserted-by":"crossref","unstructured":"Konstantinidis A, Kelly PH, Ramanujam J, et\u00a0al (2013) Parametric GPU code generation for affine loop programs. In: International Workshop on Languages and Compilers for Parallel Computing, Springer, pp 136\u2013151","DOI":"10.1007\/978-3-319-09967-5_8"},{"key":"480_CR31","unstructured":"Lammich P (2022) Refinement of parallel algorithms down to LLVM. In: 13th International Conference on Interactive Theorem Proving (ITP 2022), Schloss Dagstuhl-Leibniz-Zentrum f\u00fcr Informatik"},{"key":"480_CR32","unstructured":"Le Q, Ngiam J, Coates A, et\u00a0al (2011) On Optimization Methods for Deep Learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML). Omnipress, New York, USA, pp 265\u2013272"},{"key":"480_CR33","doi-asserted-by":"crossref","unstructured":"Leroy X (2006) Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In: Conference record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp 42\u201354","DOI":"10.1145\/1111037.1111042"},{"issue":"4","key":"480_CR34","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1007\/s10817-009-9155-4","volume":"43","author":"X Leroy","year":"2009","unstructured":"Leroy X (2009) A formally verified compiler back-end. Journal of Automated Reasoning 43(4):363\u2013446","journal-title":"Journal of Automated Reasoning"},{"key":"480_CR35","doi-asserted-by":"crossref","unstructured":"Li G, Gopalakrishnan G (2010) Scalable SMT-based verification of GPU kernel functions. SIGSOFT FSE 2010, Santa Fe, NM. USA. ACM, New York, USA, pp 187\u2013196","DOI":"10.1145\/1882291.1882320"},{"key":"480_CR36","doi-asserted-by":"crossref","unstructured":"Li G, Li P, Sawaya G, et\u00a0al (2012) GKLEE: concolic verification and test generation for GPUs. In: ACM SIGPLAN Notices, ACM, pp 215\u2013224","DOI":"10.1145\/2145816.2145844"},{"issue":"2","key":"480_CR37","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1109\/MM.2008.31","volume":"28","author":"L Lindholm","year":"2008","unstructured":"Lindholm L, Nickolls J, Oberman S et al (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39\u201355. https:\/\/doi.org\/10.1109\/MM.2008.31","journal-title":"IEEE Micro"},{"key":"480_CR38","doi-asserted-by":"publisher","unstructured":"Liou JY, Wang X, Forrest S, et al (2020) Gevo: GPU code optimization using evolutionary computation. ACM Transactions on Architecture and Code Optimization 17(4). https:\/\/doi.org\/10.1145\/3418055","DOI":"10.1145\/3418055"},{"key":"480_CR39","doi-asserted-by":"publisher","unstructured":"Liu X, Tan S, Wang H (2012) Parallel statistical analysis of analog circuits by GPU-accelerated graph-based approach. In: Proceedings of the 2012 Conference and Exhibition on Design, Automation & Test in Europe (DATE). IEEE Computer Society, Washington, DC, pp 852\u2013857, https:\/\/doi.org\/10.1109\/DATE.2012.6176615","DOI":"10.1109\/DATE.2012.6176615"},{"key":"480_CR40","doi-asserted-by":"crossref","unstructured":"de\u00a0Moura LM, Bj\u00f8rner N (2008) Z3: An efficient SMT solver. In: Ramakrishnan C, Rehof J (eds) TACAS, LNCS, vol 4963. Springer, Berlin, Heidelberg, pp 337\u2013340","DOI":"10.1007\/978-3-540-78800-3_24"},{"key":"480_CR41","doi-asserted-by":"crossref","unstructured":"M\u00fcller P, Schwerhoff M, Summers A (2016) Viper\u2013a verification infrastructure for permission-based reasoning. In: VMCAI","DOI":"10.1007\/978-3-662-49122-5_2"},{"key":"480_CR42","doi-asserted-by":"crossref","unstructured":"Murthy GS, Ravishankar M, Baskaran MM, et\u00a0al (2010) Optimal loop unrolling for GPGPU programs. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), IEEE, pp 1\u201311","DOI":"10.1109\/IPDPS.2010.5470423"},{"key":"480_CR43","doi-asserted-by":"crossref","unstructured":"Namjoshi KS, Pavlinovic Z (2018) The impact of program transformations on static program analysis. In: International Static Analysis Symposium, Springer, pp 306\u2013325","DOI":"10.1007\/978-3-319-99725-4_19"},{"key":"480_CR44","doi-asserted-by":"crossref","unstructured":"Namjoshi KS, Singhania N (2016) Loopy: Programmable and formally verified loop transformations. In: International Static Analysis Symposium, Springer, pp 383\u2013402","DOI":"10.1007\/978-3-662-53413-7_19"},{"key":"480_CR45","doi-asserted-by":"crossref","unstructured":"Namjoshi KS, Xue A (2021) A self-certifying compilation framework for WebAssembly. In: International Conference on Verification, Model Checking, and Abstract Interpretation, Springer, pp 127\u2013148","DOI":"10.1007\/978-3-030-67067-2_7"},{"key":"480_CR46","unstructured":"OpenCL (2011) The OpenCL 1.2 specification"},{"key":"480_CR47","doi-asserted-by":"crossref","unstructured":"Osama M, Wijs A (2019) Parallel SAT Simplification on GPU architectures. In: TACAS, Part I, LNCS, vol 11427. Springer, Berlin, Heidelberg, pp 21\u201340","DOI":"10.1007\/978-3-030-17462-0_2"},{"key":"480_CR48","doi-asserted-by":"publisher","unstructured":"Osama M, Wijs A, Biere A (2021) SAT solving with GPU accelerated Inprocessing. In: Proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Part I, Lecture Notes in Computer Science, vol 12651. Springer, Berlin, Heidelberg, pp 133\u2013151, https:\/\/doi.org\/10.1007\/978-3-030-72016-2_8","DOI":"10.1007\/978-3-030-72016-2_8"},{"key":"480_CR49","doi-asserted-by":"crossref","unstructured":"de\u00a0Putter S, Wijs A (2016) Verifying a verifier: on the formal correctness of an LTS transformation verification technique. In: International Conference on Fundamental Approaches to Software Engineering, Springer, pp 383\u2013400","DOI":"10.1007\/978-3-662-49665-7_23"},{"issue":"6","key":"480_CR50","doi-asserted-by":"publisher","first-page":"519","DOI":"10.1145\/2499370.2462176","volume":"48","author":"J Ragan-Kelley","year":"2013","unstructured":"Ragan-Kelley J, Barnes C, Adams A et al (2013) Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM Sigplan Notices 48(6):519\u2013530","journal-title":"ACM Sigplan Notices"},{"issue":"8","key":"480_CR51","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4053","volume":"29","author":"RC Rocha","year":"2017","unstructured":"Rocha RC, Pereira AD, Ramos L et al (2017) Toast: automatic tiling for iterative stencil computations on GPUs. Concurrency and Computation: Practice and Experience 29(8):e4053","journal-title":"Concurrency and Computation: Practice and Experience"},{"key":"480_CR52","doi-asserted-by":"publisher","unstructured":"Safari M (2022) Correct optimized GPU programs. PhD Thesis\u2013Research UT, graduation UT, University of Twente, Netherlands, https:\/\/doi.org\/10.3990\/1.9789036553421","DOI":"10.3990\/1.9789036553421"},{"key":"480_CR53","doi-asserted-by":"crossref","unstructured":"Safari M, Huisman M (2020a) Formal verification of parallel stream compaction and summed-area table algorithms. In: International Colloquium on Theoretical Aspects of Computing, Springer, pp 181\u2013199","DOI":"10.1007\/978-3-030-64276-1_10"},{"key":"480_CR54","doi-asserted-by":"crossref","unstructured":"Safari M, Huisman M (2020b) A generic approach to the verification of the permutation property of sequential and parallel swap-based sorting algorithms. In: International Conference on Integrated Formal Methods, Springer, pp 257\u2013275","DOI":"10.1007\/978-3-030-63461-2_14"},{"key":"480_CR55","doi-asserted-by":"crossref","unstructured":"Safari M, Oortwijn W, Joosten S, et\u00a0al (2020) Formal verification of parallel prefix sum. In: NASA Formal Methods Symposium, Springer, pp 170\u2013186","DOI":"10.1007\/978-3-030-55754-6_10"},{"key":"480_CR56","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1007\/978-3-030-88806-0_17","volume-title":"Static Analysis","author":"M Safari","year":"2021","unstructured":"Safari M, Oortwijn W, Huisman M (2021) Automated verification of the parallel Bellman-Ford algorithm. In: Dr\u0103goi C, Mukherjee S, Namjoshi K (eds) Static Analysis. Springer International Publishing, Cham, pp 346\u2013358"},{"key":"480_CR57","unstructured":"\u015eakar \u00d6 (2020) Extending support for axiomatic data types in VerCors. http:\/\/essay.utwente.nl\/80892\/"},{"key":"480_CR58","doi-asserted-by":"publisher","unstructured":"\u015eakar \u00d6, Safari M, Huisman M, et\u00a0al (2022a) Alpinist: An Annotation-Aware GPU Program Optimizer, Springer, Berlin, Heidelberg, p 332\u2013352. Lecture Notes in Computer Science, https:\/\/doi.org\/10.1007\/978-3-030-99527-0_18","DOI":"10.1007\/978-3-030-99527-0_18"},{"key":"480_CR59","unstructured":"\u015eakar \u00d6, Safari M, Huisman M, et\u00a0al (2022b) The repository for the examples used in Alpinist. https:\/\/github.com\/OmerSakar\/Alpinist-Examples.git"},{"issue":"10","key":"480_CR60","doi-asserted-by":"publisher","first-page":"9955","DOI":"10.1364\/OE.18.009955","volume":"18","author":"T Shimobaba","year":"2010","unstructured":"Shimobaba T, Ito T, Masuda N et al (2010) Fast calculation of computer-generated-hologram on AMD HD5000 series GPU and OpenCL. Optics Express 18(10):9955\u20139960","journal-title":"Optics Express"},{"key":"480_CR61","doi-asserted-by":"crossref","unstructured":"Sundfeld D, Havgaard JH, Gorodkin J, et\u00a0al (2017) CUDA-Sankoff: using GPU to accelerate the pairwise structural RNA alignment. In: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), IEEE, pp 295\u2013302","DOI":"10.1109\/PDP.2017.15"},{"key":"480_CR62","unstructured":"The CUDA team (2021) Documentation of the CUDA unroll pragma. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html#pragma-unroll, accessed Oct. 6, 2021"},{"key":"480_CR63","unstructured":"The Halide team (2021) Documentation of the Halide unroll function. https:\/\/halide-lang.org\/docs\/class_halide_1_1_func.html#a05935caceb6efb8badd85f306dd33034, accessed Oct. 6, 2021"},{"key":"480_CR64","unstructured":"TicTacToeMatrixGrid (2022) The verification of tictactoe program. https:\/\/github.com\/utwente-fmt\/vercors\/blob\/0a2fdc24419466c2d3b7a853a2908c37e7a8daa7\/examples\/session-generate\/MatrixGrid.pvl"},{"key":"480_CR65","doi-asserted-by":"crossref","unstructured":"Unkule S, Shaltz C, Qasem A (2012) Automatic restructuring of GPU kernels for exploiting inter-thread data locality. In: International Conference on Compiler Construction, Springer, pp 21\u201340","DOI":"10.1007\/978-3-642-28652-0_2"},{"key":"480_CR66","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.future.2013.09.003","volume":"30","author":"B Van Werkhoven","year":"2014","unstructured":"Van Werkhoven B, Maassen J, Bal HE et al (2014) Optimizing convolution operations on GPUs using adaptive tiling. Futur Gener Comput Syst 30:14\u201326","journal-title":"Futur Gener Comput Syst"},{"key":"480_CR67","unstructured":"VerCors1.4 (2024) VerCors version 1.4. https:\/\/github.com\/utwente-fmt\/vercors\/tree\/v1"},{"key":"480_CR68","unstructured":"VerCors2.0 (2023) The developer branch of VerCors version 2.0. https:\/\/github.com\/utwentefmt\/vercors\/tree\/0b6a3b2eb21192198647548bcb9cb9b559a1ed64"},{"key":"480_CR69","unstructured":"Viper team (2016) Viper project website. http:\/\/www.pm.inf.ethz.ch\/research\/viper, http:\/\/www.pm.inf.ethz.ch\/research\/viper"},{"key":"480_CR70","doi-asserted-by":"crossref","unstructured":"Wahib M, Maruyama N (2014) Scalable kernel fusion for memory-bound GPU applications. In: SC\u201914: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE, pp 191\u2013202","DOI":"10.1109\/SC.2014.21"},{"key":"480_CR71","doi-asserted-by":"crossref","unstructured":"Wang G, Lin Y, Yi W (2010) Kernel fusion: an effective method for better power efficiency on multithreaded GPU. In: 2010 IEEE\/ACM Int\u2019l Conference on Green Computing and Communications & Int\u2019l Conference on Cyber, Physical and Social Computing, IEEE, pp 344\u2013350","DOI":"10.1109\/GreenCom-CPSCom.2010.102"},{"key":"480_CR72","doi-asserted-by":"publisher","first-page":"347","DOI":"10.1016\/j.future.2018.08.004","volume":"90","author":"B Van Werkhoven","year":"2019","unstructured":"Van Werkhoven B (2019) Kernel tuner: a search-optimizing GPU code auto-tuner. Futur Gener Comput Syst 90:347\u2013358","journal-title":"Futur Gener Comput Syst"},{"key":"480_CR73","doi-asserted-by":"publisher","unstructured":"Wienke S, Springer P, Terboven C, et\u00a0al (2012) OpenACC\u2013first experiences with real-world applications. In: Proceedings of the 18th European Conference on Parallel and Distributed Computing (EuroPar), Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg, pp 859\u2013870, https:\/\/doi.org\/10.1007\/978-3-642-32820-6_85","DOI":"10.1007\/978-3-642-32820-6_85"},{"key":"480_CR74","doi-asserted-by":"crossref","unstructured":"Wijs A (2016) BFS-based model checking of linear-time properties with an application on GPUs. In: CAV, Part II, LNCS, vol 9780. Springer, Berlin, Heidelberg, pp 472\u2013493","DOI":"10.1007\/978-3-319-41540-6_26"},{"key":"480_CR75","doi-asserted-by":"crossref","unstructured":"Wijs A, Engelen L (2014) REFINER: towards formal verification of model transformations. In: NFM. Springer, Cham, LNCS, pp 258\u2013263","DOI":"10.1007\/978-3-319-06200-6_21"},{"key":"480_CR76","doi-asserted-by":"publisher","unstructured":"Wijs A, Neele T, Bo\u0161na\u010dki D (2016) GPUexplore 2.0: unleashing GPU explicit-state model checking. In: Proceedings of the 21st International Symposium on Formal Methods, Lecture Notes in Computer Science, vol 9995. Springer, Berlin, Heidelberg, pp 694\u2013701, https:\/\/doi.org\/10.1007\/978-3-319-48989-6_42","DOI":"10.1007\/978-3-319-48989-6_42"},{"key":"480_CR77","doi-asserted-by":"crossref","unstructured":"Wu H, Diamos G, Wang J, et\u00a0al (2012) Optimizing data warehousing applications for GPUs using kernel fusion\/fission. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IEEE, pp 2433\u20132442","DOI":"10.1109\/IPDPSW.2012.300"},{"key":"480_CR78","doi-asserted-by":"crossref","unstructured":"Xu C, Kirk SR, Jenkins S (2009) Tiling for performance tuning on different models of GPUs. In: 2009 Second International Symposium on Information Science and Engineering, IEEE, pp 500\u2013504","DOI":"10.1109\/ISISE.2009.60"},{"issue":"6","key":"480_CR79","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1145\/1809028.1806606","volume":"45","author":"Y Yang","year":"2010","unstructured":"Yang Y, Xiang P, Kong J et al (2010) A GPGPU compiler for memory optimization and parallelism management. ACM Sigplan Notices 45(6):86\u201397","journal-title":"ACM Sigplan Notices"}],"container-title":["Formal Methods in System Design"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10703-025-00480-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10703-025-00480-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10703-025-00480-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T16:13:55Z","timestamp":1764346435000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10703-025-00480-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,22]]},"references-count":79,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["480"],"URL":"https:\/\/doi.org\/10.1007\/s10703-025-00480-7","relation":{},"ISSN":["0925-9856","1572-8102"],"issn-type":[{"type":"print","value":"0925-9856"},{"type":"electronic","value":"1572-8102"}],"subject":[],"published":{"date-parts":[[2025,11,22]]},"assertion":[{"value":"16 January 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 November 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}