{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T19:11:43Z","timestamp":1771960303548,"version":"3.50.1"},"reference-count":14,"publisher":"Association for Computing Machinery (ACM)","issue":"POPL","license":[{"start":{"date-parts":[[2021,1,4]],"date-time":"2021-01-04T00:00:00Z","timestamp":1609718400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["FA8750-18-C-0092"],"award-info":[{"award-number":["FA8750-18-C-0092"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["SaTC 1801369, CAREER 1845514, SHF 1812876, SHF 2007784"],"award-info":[{"award-number":["SaTC 1801369, CAREER 1845514, SHF 1812876, SHF 2007784"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2021,1,4]]},"abstract":"<jats:p>General-purpose programming on GPUs (GPGPU) is becoming increasingly in vogue as applications such as machine learning and scientific computing demand high throughput in vector-parallel applications. NVIDIA's CUDA toolkit seeks to make GPGPU programming accessible by allowing programmers to write GPU functions, called kernels, in a small extension of C\/C++. However, due to CUDA's complex execution model, the performance characteristics of CUDA kernels are difficult to predict, especially for novice programmers.<\/jats:p>\n          <jats:p>This paper introduces a novel quantitative program logic for CUDA kernels, which allows programmers to reason about both functional correctness and resource usage of CUDA kernels, paying particular attention to a set of common but CUDA-specific performance bottlenecks. The logic is proved sound with respect to a novel operational cost semantics for CUDA kernels. The semantics, logic and soundness proofs are formalized in Coq. An inference algorithm based on LP solving automatically synthesizes symbolic resource bounds by generating derivations in the logic. This algorithm is the basis of RaCuda, an end-to-end resource-analysis tool for kernels, which has been implemented using an existing resource-analysis tool for imperative programs. An experimental evaluation on a suite of CUDA benchmarks shows that the analysis is effective in aiding the detection of performance bugs in CUDA kernels.<\/jats:p>","DOI":"10.1145\/3434306","type":"journal-article","created":{"date-parts":[[2021,1,4]],"date-time":"2021-01-04T17:34:24Z","timestamp":1609781664000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Modeling and analyzing evaluation cost of CUDA kernels"],"prefix":"10.1145","volume":"5","author":[{"given":"Stefan K.","family":"Muller","sequence":"first","affiliation":[{"name":"Illinois Institute of Technology, USA"}]},{"given":"Jan","family":"Hoffmann","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,1,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25318-8_19"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/LICS.2011.22"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1882291.1882320"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370036.2145844"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2935764.2935793"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192394"},{"key":"e_1_2_1_7_1","unstructured":"NVIDIA Corporation. 2019. CUDA C Programming Guide v. 10.1.168.  NVIDIA Corporation. 2019. CUDA C Programming Guide v. 10.1.168."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192368"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851613.2851830"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3158124"},{"key":"e_1_2_1_11_1","volume-title":"Sponge Examples: Energy-Latency Attacks on Neural Networks. arXiv:cs.LG\/","author":"Shumailov Ilia","year":"2020","unstructured":"Ilia Shumailov , Yiren Zhao , Daniel Bates , Nicolas Papernot , Robert Mullins , and Ross Anderson . 2020 . Sponge Examples: Energy-Latency Attacks on Neural Networks. arXiv:cs.LG\/ 2006.03463 Ilia Shumailov, Yiren Zhao, Daniel Bates, Nicolas Papernot, Robert Mullins, and Ross Anderson. 2020. Sponge Examples: Energy-Latency Attacks on Neural Networks. arXiv:cs.LG\/ 2006.03463"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-08867-9_50"},{"key":"e_1_2_1_14_1","volume-title":"Characterizing and Detecting CUDA Program Bugs. CoRR abs\/","author":"Wu Mingyuan","year":"1905","unstructured":"Mingyuan Wu , Husheng Zhou , Lingming Zhang , Cong Liu , and Yuqun Zhang . 2019. Characterizing and Detecting CUDA Program Bugs. CoRR abs\/ 1905 . 01833 ( 2019 ). arXiv: 1905. 01833 http:\/\/arxiv.org\/abs\/ 1905.01833 Mingyuan Wu, Husheng Zhou, Lingming Zhang, Cong Liu, and Yuqun Zhang. 2019. Characterizing and Detecting CUDA Program Bugs. CoRR abs\/ 1905. 01833 ( 2019 ). arXiv: 1905. 01833 http:\/\/arxiv.org\/abs\/ 1905.01833"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941574"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3434306","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3434306","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3434306","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:35Z","timestamp":1750195475000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3434306"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,4]]},"references-count":14,"journal-issue":{"issue":"POPL","published-print":{"date-parts":[[2021,1,4]]}},"alternative-id":["10.1145\/3434306"],"URL":"https:\/\/doi.org\/10.1145\/3434306","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,4]]},"assertion":[{"value":"2021-01-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}