{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T05:17:53Z","timestamp":1740028673371,"version":"3.37.3"},"reference-count":0,"publisher":"IOS Press","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016]]},"abstract":"<jats:p>We present a framework, based on the QuickSched [1] library, that implements priority-aware task-based parallelism directly on CUDA GPUs. This allows large computations with complex data dependencies to be executed in a single GPU kernel call, removing any synchronization points that might otherwise be required between kernel calls. Using this paradigm, data transfers to and from the GPU are modelled as load and unload tasks. These tasks are automatically generated and executed alongside the rest of the computational tasks, allowing fully asynchronous and concurrent data transfers. We implemented a tiled-QR decomposition, and a Barnes-Hut gravity calculation, both of which show significant improvement when utilising the task-based setup, effectively eliminating any latencies due to data transfers between the GPU and the CPU. This shows that task-based parallelism is a valid alternative programming paradigm on GPUs, and can provide significant gains from both a data transfer and ease-of-use perspective.<\/jats:p>","DOI":"10.3233\/978-1-61499-621-7-683","type":"book-chapter","created":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T15:30:51Z","timestamp":1739979051000},"source":"Crossref","is-referenced-by-count":0,"title":["Using Task-Based Parallelism Directly on the GPU for Automated Asynchronous Data Transfer"],"prefix":"10.3233","author":[{"family":"Chalk Aidan B.G.","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"family":"Gonnet Pedro","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"family":"Schaller Matthieu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Advances in Parallel Computing","Parallel Computing: On the Road to Exascale"],"original-title":[],"deposited":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T15:48:07Z","timestamp":1739980087000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.medra.org\/servlet\/aliasResolver?alias=iospressISBN&isbn=978-1-61499-620-0&spage=683&doi=10.3233\/978-1-61499-621-7-683"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016]]},"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/978-1-61499-621-7-683","relation":{},"ISSN":["0927-5452"],"issn-type":[{"value":"0927-5452","type":"print"}],"subject":[],"published":{"date-parts":[[2016]]}}}