{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:14:18Z","timestamp":1771697658292,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2015,12,5]],"date-time":"2015-12-05T00:00:00Z","timestamp":1449273600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Google Faculty Award"},{"DOI":"10.13039\/100007065","name":"Nvidia","doi-asserted-by":"publisher","award":["GPU device donation"],"award-info":[{"award-number":["GPU device donation"]}],"id":[{"id":"10.13039\/100007065","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1455404, 1525609, Career Award"],"award-info":[{"award-number":["1455404, 1525609, Career Award"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"DOE Early Career Award"},{"name":"IBM CAS Fellowship"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2015,12,5]]},"DOI":"10.1145\/2830772.2830818","type":"proceedings-article","created":{"date-parts":[[2016,1,11]],"date-time":"2016-01-11T13:38:13Z","timestamp":1452519493000},"page":"407-419","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":43,"title":["Free launch"],"prefix":"10.1145","author":[{"given":"Guoyang","family":"Chen","sequence":"first","affiliation":[{"name":"North Carolina State University, Raleigh, NC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xipeng","family":"Shen","sequence":"additional","affiliation":[{"name":"North Carolina State University, Raleigh, NC"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,12,5]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Lonestar: A suite of parallel irregular programs,\" in Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software","author":"Kulkarni M.","year":"2009","unstructured":"M. Kulkarni , M. Burtscher , K. Pingali , and C. Cascaval , \" Lonestar: A suite of parallel irregular programs,\" in Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software , 2009 . M. Kulkarni, M. Burtscher, K. Pingali, and C. Cascaval, \"Lonestar: A suite of parallel irregular programs,\" in Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, 2009."},{"key":"e_1_3_2_1_2_1","unstructured":"S. Jones \"Introduction to dynamic parallelism \" in Nvidia GPU Technology Conference (San Jose CA) May 2012.  S. Jones \"Introduction to dynamic parallelism \" in Nvidia GPU Technology Conference (San Jose CA) May 2012."},{"key":"e_1_3_2_1_3_1","unstructured":"\"OpenCL.\" http:\/\/www.khronos.org\/opencl\/.  \"OpenCL.\" http:\/\/www.khronos.org\/opencl\/."},{"key":"e_1_3_2_1_4_1","volume-title":"October","author":"Wang J.","year":"2014","unstructured":"J. Wang and S. Yalamanchili , \" Characterization and analysis of dynamic parallelism in unstructured gpu applications,\" in 2014 IEEE International Symposium on Workload Characterization , October 2014 . J. Wang and S. Yalamanchili, \"Characterization and analysis of dynamic parallelism in unstructured gpu applications,\" in 2014 IEEE International Symposium on Workload Characterization, October 2014."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750393"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2692916.2555254"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.24"},{"key":"e_1_3_2_1_8_1","first-page":"1","volume-title":"A study of persistent threads style gpu programming for gpgpu workloads,\" in Innovative Parallel Computing (InPar)","author":"Gupta K.","year":"2012","unstructured":"K. Gupta , J. A. Stuart , and J. D. Owens , \" A study of persistent threads style gpu programming for gpgpu workloads,\" in Innovative Parallel Computing (InPar) , 2012 , pp. 1 -- 14 , IEEE , 2012. K. Gupta, J. A. Stuart, and J. D. Owens, \"A study of persistent threads style gpu programming for gpgpu workloads,\" in Innovative Parallel Computing (InPar), 2012, pp. 1--14, IEEE, 2012."},{"key":"e_1_3_2_1_9_1","first-page":"539","volume-title":"Cetus - an extensible compiler infrastructure for source-to-source transformation,\" in In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC)","author":"Lee S.","year":"2003","unstructured":"S. Lee , T. Johnson , and R. Eigenmann , \" Cetus - an extensible compiler infrastructure for source-to-source transformation,\" in In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC) , pp. 539 -- 553 , 2003 . S. Lee, T. Johnson, and R. Eigenmann, \"Cetus - an extensible compiler infrastructure for source-to-source transformation,\" in In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 539--553, 2003."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89740-8_2"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735702"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941597"},{"key":"e_1_3_2_1_13_1","volume-title":"Introduction to Algorithms","author":"Cormen T. H.","year":"2002","unstructured":"T. H. Cormen , C. E. Leiserson , R. L. Rivest , and C. Stein , Introduction to Algorithms . MIT Press and McGraw - Hill , 2002 . T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press and McGraw - Hill, 2002."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2668930.2688046"},{"key":"e_1_3_2_1_15_1","unstructured":"\"NVIDIA CUDA.\" http:\/\/www.nvidia.com\/cuda.  \"NVIDIA CUDA.\" http:\/\/www.nvidia.com\/cuda."},{"key":"e_1_3_2_1_16_1","volume-title":"Performance impact of dynamic parallelism on different clustering algorithms,\" in SPIE Defense, Security, and Sensing","author":"DiMarco J.","year":"2013","unstructured":"J. DiMarco and M. Taufer , \" Performance impact of dynamic parallelism on different clustering algorithms,\" in SPIE Defense, Security, and Sensing , International Society for Optics and Photonics , 2013 . J. DiMarco and M. Taufer, \"Performance impact of dynamic parallelism on different clustering algorithms,\" in SPIE Defense, Security, and Sensing, International Society for Optics and Photonics, 2013."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751213"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950365.1950408"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810085.1810104"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442523"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.16"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504181"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-009-0340-3"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89740-8_8"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.20"},{"key":"e_1_3_2_1_26_1","first-page":"185","volume-title":"2013 IEEE International Symposium on","author":"Che S.","year":"2013","unstructured":"S. Che , B. M. Beckmann , S. K. Reinhardt , and K. Skadron , \" Pannotia: Understanding irregular gpgpu graph applications,\" in Workload Characterization (IISWC) , 2013 IEEE International Symposium on , pp. 185 -- 195 , IEEE, 2013 . S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron, \"Pannotia: Understanding irregular gpgpu graph applications,\" in Workload Characterization (IISWC), 2013 IEEE International Symposium on, pp. 185--195, IEEE, 2013."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2012.6402918"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1040305.1040336"}],"event":{"name":"MICRO-48: The 48th Annual IEEE\/ACM International Symposium of Microarchitecture","location":"Waikiki Hawaii","acronym":"MICRO-48","sponsor":["IEEE Computer Society TC-uARCH","SIGMICRO ACM Special Interest Group on Microarchitectural Research and Processing"]},"container-title":["Proceedings of the 48th International Symposium on Microarchitecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2830772.2830818","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2830772.2830818","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:48:40Z","timestamp":1750225720000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2830772.2830818"}},"subtitle":["optimizing GPU dynamic kernel launches through thread reuse"],"short-title":[],"issued":{"date-parts":[[2015,12,5]]},"references-count":28,"alternative-id":["10.1145\/2830772.2830818","10.1145\/2830772"],"URL":"https:\/\/doi.org\/10.1145\/2830772.2830818","relation":{},"subject":[],"published":{"date-parts":[[2015,12,5]]},"assertion":[{"value":"2015-12-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}