{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:11:56Z","timestamp":1775873516353,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T00:00:00Z","timestamp":1710201600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"United States Air Force and DARPA","award":["FA8750-18-C-0092"],"award-info":[{"award-number":["FA8750-18-C-0092"]}]},{"name":"National Science Foundation","award":["1801369, 1845514"],"award-info":[{"award-number":["1801369, 1845514"]}]},{"name":"SHF","award":["1812876 and 2007784"],"award-info":[{"award-number":["1812876 and 2007784"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>Motivated by the increasing importance of general-purpose Graphic Processing Units (GPGPU) programming, exemplified by NVIDIA\u2019s CUDA framework, as well as the difficulty, especially for novice programmers, of reasoning about performance in GPGPU kernels, we introduce a novel quantitative program logic for CUDA kernels. The logic allows programmers to reason about both functional correctness and resource usage of CUDA kernels, paying particular attention to a set of common but CUDA-specific performance bottlenecks: warp divergences, uncoalesced memory accesses, and bank conflicts. The logic is proved sound with respect to a novel operational cost semantics for CUDA kernels. The semantics, logic, and soundness proofs are formalized in Coq. An inference algorithm based on LP solving automatically synthesizes symbolic resource bounds by generating derivations in the logic. This algorithm is the basis of RaCUDA, an end-to-end resource-analysis tool for kernels, which has been implemented using an existing resource-analysis tool for imperative programs. An experimental evaluation on a suite of benchmarks shows that the analysis is effective in aiding the detection of performance bugs in CUDA kernels.<\/jats:p>","DOI":"10.1145\/3639403","type":"journal-article","created":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T11:16:16Z","timestamp":1705058176000},"page":"1-53","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Modeling and Analyzing Evaluation Cost of CUDA Kernels"],"prefix":"10.1145","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3210-9727","authenticated-orcid":false,"given":"Stefan K.","family":"Muller","sequence":"first","affiliation":[{"name":"Illinois Institute of Technology, Chicago, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8326-0788","authenticated-orcid":false,"given":"Jan","family":"Hoffmann","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,3,12]]},"reference":[{"key":"e_1_3_3_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25318-8_19"},{"key":"e_1_3_3_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-63387-9_25"},{"key":"e_1_3_3_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11957-6_6"},{"key":"e_1_3_3_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-32149-3_53"},{"key":"e_1_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/224164.224210"},{"key":"e_1_3_3_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/232629.232650"},{"key":"e_1_3_3_8_1","volume-title":"Proceedings of the 3rd Workshop on Software Tools for MultiCore Systems","author":"Boyer Michael","year":"2008","unstructured":"Michael Boyer, Kevin Skadron, and Westley Weimer. 2008. Automated dynamic analysis of CUDA programs. In Proceedings of the 3rd Workshop on Software Tools for MultiCore Systems."},{"key":"e_1_3_3_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/321812.321815"},{"key":"e_1_3_3_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594291.2594301"},{"key":"e_1_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-63390-9_4"},{"key":"e_1_3_3_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2737955"},{"key":"e_1_3_3_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-81685-8_19"},{"key":"e_1_3_3_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858949.2784749"},{"key":"e_1_3_3_15_1","volume-title":"Proceedings of the 23rd International Conference on Functional Programming (ICFP\u201918)","author":"Das Ankush","year":"2018","unstructured":"Ankush Das, Jan Hoffmann, and Frank Pfenning. 2018. Parallel complexity analysis with temporal session types. In Proceedings of the 23rd International Conference on Functional Programming (ICFP\u201918)."},{"key":"e_1_3_3_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.21127"},{"key":"e_1_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3062341.3062342"},{"key":"e_1_3_3_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-89884-1_19"},{"key":"e_1_3_3_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1480881.1480898"},{"key":"e_1_3_3_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964179.1964184"},{"key":"e_1_3_3_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2362389.2362393"},{"key":"e_1_3_3_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3009837.3009842"},{"key":"e_1_3_3_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-46669-8_6"},{"key":"e_1_3_3_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/640128.604148"},{"key":"e_1_3_3_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2022.102518"},{"key":"e_1_3_3_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3205289.3205298"},{"key":"e_1_3_3_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3062341.3062373"},{"key":"e_1_3_3_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3001834"},{"key":"e_1_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/LICS.2011.22"},{"key":"e_1_3_3_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1882291.1882320"},{"key":"e_1_3_3_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370036.2145844"},{"key":"e_1_3_3_32_1","doi-asserted-by":"publisher","DOI":"10.4204\/EPTCS.356.4"},{"key":"e_1_3_3_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2935764.2935793"},{"key":"e_1_3_3_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236790"},{"key":"e_1_3_3_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3434306"},{"key":"e_1_3_3_36_1","doi-asserted-by":"crossref","unstructured":"Stefan K. Muller and Jan Hoffmann. 2023. Modeling and Analyzing Evaluation Cost of CUDA Kernels. Retrieved from DOI:https:\/\/zenodo.org\/records\/10392393","DOI":"10.1145\/3639403"},{"key":"e_1_3_3_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192394"},{"key":"e_1_3_3_38_1","volume-title":"CUDA C Programming Guide v.10.1.168","author":"Corporation NVIDIA","year":"2019","unstructured":"NVIDIA Corporation. 2019. CUDA C Programming Guide v.10.1.168."},{"key":"e_1_3_3_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192368"},{"key":"e_1_3_3_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851613.2851830"},{"key":"e_1_3_3_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3158124"},{"key":"e_1_3_3_42_1","first-page":"159","volume-title":"Proceedings of the Joint Workshop of the German Research Training Groups in Computer Science, Algorithmic synthesis of reactive and discrete-continuous systems (AlgoSyn\u201910)","author":"Rodriguez Dulma","year":"2010","unstructured":"Dulma Rodriguez. 2010. A type system for amortised heap-space analysis of object-oriented programs. In Proceedings of the Joint Workshop of the German Research Training Groups in Computer Science, Algorithmic synthesis of reactive and discrete-continuous systems (AlgoSyn\u201910), Kai Bollue, Dominique G\u00fcckel, Ulrich Loup, Jacob Sp\u00f6nemann, and Melanie Winkler (Eds.). Verlagshaus Mainz, Aachen, Germany, 159."},{"key":"e_1_3_3_43_1","doi-asserted-by":"crossref","unstructured":"Ilia Shumailov Yiren Zhao Daniel Bates Nicolas Papernot Robert Mullins and Ross Anderson. 2020. Sponge Examples: Energy-Latency Attacks on Neural Networks. Retrieved from https:\/\/arxiv.org\/abs\/2006.03463","DOI":"10.1109\/EuroSP51992.2021.00024"},{"key":"e_1_3_3_44_1","volume-title":"Static Analysis for GPU Program Performance","author":"Singhania Nimit","year":"2018","unstructured":"Nimit Singhania. 2018. Static Analysis for GPU Program Performance. Ph.D. Dissertation. Computer and Information Science, University of Pennsylvania."},{"key":"e_1_3_3_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-08867-9_50"},{"key":"e_1_3_3_46_1","unstructured":"Mingyuan Wu Husheng Zhou Lingming Zhang Cong Liu and Yuqun Zhang. 2019. Characterizing and detecting CUDA program bugs. Retrieved from http:\/\/arxiv.org\/abs\/1905.01833"},{"key":"e_1_3_3_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941574"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639403","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639403","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:54:11Z","timestamp":1750287251000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639403"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,12]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3639403"],"URL":"https:\/\/doi.org\/10.1145\/3639403","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"value":"2329-4949","type":"print"},{"value":"2329-4957","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,12]]},"assertion":[{"value":"2023-06-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}