{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T02:36:56Z","timestamp":1774579016867,"version":"3.50.1"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2014,11,19]],"date-time":"2014-11-19T00:00:00Z","timestamp":1416355200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002428","name":"Austrian Science Fund","doi-asserted-by":"publisher","award":["P23329"],"award-info":[{"award-number":["P23329"]}],"id":[{"id":"10.13039\/501100002428","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2014,11,19]]},"abstract":"<jats:p>In this paper, we present Whippletree, a novel approach to scheduling dynamic, irregular workloads on the GPU. We introduce a new programming model which offers the simplicity and expressiveness of task-based parallelism while retaining all aspects of the multi-level execution hierarchy essential to unlocking the full potential of a modern GPU. At the same time, our programming model lends itself to efficient implementation on the SIMD-based architecture typical of a current GPU. We demonstrate the practical utility of our model by providing a reference implementation on top of current CUDA hardware. Furthermore, we show that our model compares favorably to traditional approaches in terms of both performance as well as the range of applications that can be covered. We demonstrate the benefits of our model for recursive Reyes rendering, procedural geometry generation and volume rendering with concurrent irradiance caching.<\/jats:p>","DOI":"10.1145\/2661229.2661250","type":"journal-article","created":{"date-parts":[[2014,11,18]],"date-time":"2014-11-18T14:21:03Z","timestamp":1416320463000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":50,"title":["Whippletree"],"prefix":"10.1145","volume":"33","author":[{"given":"Markus","family":"Steinberger","sequence":"first","affiliation":[{"name":"Graz University of Technology, Austria"}]},{"given":"Michael","family":"Kenzel","sequence":"additional","affiliation":[{"name":"Graz University of Technology, Austria"}]},{"given":"Pedro","family":"Boechat","sequence":"additional","affiliation":[{"name":"Graz University of Technology, Austria"}]},{"given":"Bernhard","family":"Kerbl","sequence":"additional","affiliation":[{"name":"Graz University of Technology, Austria"}]},{"given":"Mark","family":"Dokter","sequence":"additional","affiliation":[{"name":"Graz University of Technology, Austria"}]},{"given":"Dieter","family":"Schmalstieg","sequence":"additional","affiliation":[{"name":"Graz University of Technology, Austria"}]}],"member":"320","published-online":{"date-parts":[[2014,11,19]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1572769.1572792"},{"key":"e_1_2_2_2_1","first-page":"373","article-title":"Static GPU threads and an improved scan algorithm","volume":"2010","author":"Breitbart J.","year":"2011","unstructured":"Breitbart , J. 2011 . Static GPU threads and an improved scan algorithm . In Proc. Euro-Par 2010 , 373 -- 380 . Breitbart, J. 2011. Static GPU threads and an improved scan algorithm. In Proc. Euro-Par 2010, 373--380.","journal-title":"Proc. Euro-Par"},{"key":"e_1_2_2_3_1","volume-title":"Proc. Graphics Hardware, 57--64","author":"Cederman D.","unstructured":"Cederman , D. , and Tsigas , P . 2008. On dynamic load balancing on graphics processors . In Proc. Graphics Hardware, 57--64 . Cederman, D., and Tsigas, P. 2008. On dynamic load balancing on graphics processors. In Proc. Graphics Hardware, 57--64."},{"key":"e_1_2_2_4_1","volume-title":"Proc. Languages and Compilers for Parallel Computing.","author":"Chatterjee S.","unstructured":"Chatterjee , S. , Grossman , M. , Sbirlea , A. , and Sarkar , V . 2011. Dynamic task parallelism with a GPU work-stealing runtime system . In Proc. Languages and Compilers for Parallel Computing. Chatterjee, S., Grossman, M., Sbirlea, A., and Sarkar, V. 2011. Dynamic task parallelism with a GPU work-stealing runtime system. In Proc. Languages and Compilers for Parallel Computing."},{"key":"e_1_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Chen L. Villa O. Krishnamoorthy S. and Gao G. 2010. Dynamic load balancing on single- and multi-gpu systems. In IEEE Parallel Distributed Processing.  Chen L. Villa O. Krishnamoorthy S. and Gao G. 2010. Dynamic load balancing on single- and multi-gpu systems. In IEEE Parallel Distributed Processing .","DOI":"10.1109\/IPDPS.2010.5470413"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/37402.37414"},{"key":"e_1_2_2_7_1","unstructured":"Hargreaves S. 2005. Generating shaders from HLSL fragments. ShaderX3: Advanced rendering with DirectX and OpenGL.  Hargreaves S. 2005. Generating shaders from HLSL fragments. ShaderX3: Advanced rendering with DirectX and OpenGL ."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1572769.1572797"},{"key":"e_1_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Kroes T. Post F. H. and Botha C. P. 2012. Exposure render: An interactive photo-realistic volume rendering framework. PLoS ONE 7 7 (07).  Kroes T. Post F. H. and Botha C. P. 2012. Exposure render: An interactive photo-realistic volume rendering framework. PLoS ONE 7 7 (07).","DOI":"10.1371\/journal.pone.0038586"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2492045.2492060"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1730804.1730817"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837274.1837289"},{"key":"e_1_2_2_13_1","unstructured":"NVIDIA. 2012. CUDA Dynamic Parallelism Programming Guide.  NVIDIA. 2012. CUDA Dynamic Parallelism Programming Guide ."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778803"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1409060.1409096"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161005"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366180"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.12312"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161065"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1477926.1477930"},{"key":"e_1_2_2_21_1","volume-title":"Proc. HPG, 29--37","author":"Tzeng S.","unstructured":"Tzeng , S. , Patney , A. , and Owens , J. D . 2010. Task management for irregular-parallel workloads on the GPU . In Proc. HPG, 29--37 . Tzeng, S., Patney, A., and Owens, J. D. 2010. Task management for irregular-parallel workloads on the GPU. In Proc. HPG, 29--37."},{"key":"e_1_2_2_22_1","unstructured":"Xiao S. and Feng W. 2010. Inter-block GPU communication via fast barrier synchronization. In IEEE Parallel Distributed Processing.  Xiao S. and Feng W. 2010. Inter-block GPU communication via fast barrier synchronization. In IEEE Parallel Distributed Processing ."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442539"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618501"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2661229.2661250","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2661229.2661250","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:05Z","timestamp":1750227185000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2661229.2661250"}},"subtitle":["task-based scheduling of dynamic workloads on the GPU"],"short-title":[],"issued":{"date-parts":[[2014,11,19]]},"references-count":24,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2014,11,19]]}},"alternative-id":["10.1145\/2661229.2661250"],"URL":"https:\/\/doi.org\/10.1145\/2661229.2661250","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,11,19]]},"assertion":[{"value":"2014-11-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}