{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:35:33Z","timestamp":1750307733804,"version":"3.41.0"},"reference-count":14,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2008,12,20]],"date-time":"2008-12-20T00:00:00Z","timestamp":1229731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2008,12,20]]},"abstract":"<jats:p>In this paper we take a look at GPU-Quicksort, an efficient Quicksort algorithm suitable for the highly parallel multi-core graphics processors. Quicksort had previously been considered an inefficient sorting solution for graphics processors, but GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.<\/jats:p>\n          <jats:p>We also take look at a comparison of different load balancing schemes. To get maximum performance on the many-core graphics processors it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically during execution. With the recent advent of scatter operations and atomic hardware primitives it is now possible to bring some of the more elaborate dynamic load balancing schemes from the conventional SMP systems domain to the graphics processor domain.<\/jats:p>","DOI":"10.1145\/1556444.1556447","type":"journal-article","created":{"date-parts":[[2009,6,24]],"date-time":"2009-06-24T16:47:07Z","timestamp":1245862027000},"page":"11-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["On sorting and load balancing on GPUs"],"prefix":"10.1145","volume":"36","author":[{"given":"Daniel","family":"Cederman","sequence":"first","affiliation":[{"name":"Chalmers University of Technology, G\u00f6teborg, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philippas","family":"Tsigas","sequence":"additional","affiliation":[{"name":"Chalmers University of Technology, G\u00f6teborg, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2009,6,20]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/277651.277678"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87744-8_21"},{"key":"e_1_2_1_3_1","first-page":"57","volume-title":"Proceedings of the 11th Graphics Hardware (GH 2008","author":"Cederman D.","year":"2008","unstructured":"D. Cederman and P. Tsigas . On Dynamic Load Balancing on Graphics Processors . In Proceedings of the 11th Graphics Hardware (GH 2008 ), pages 57 -- 64 . ACM press, 2008 . D. Cederman and P. Tsigas. On Dynamic Load Balancing on Graphics Processors. In Proceedings of the 11th Graphics Hardware (GH 2008), pages 57--64. ACM press, 2008."},{"key":"e_1_2_1_4_1","volume-title":"December","author":"Cederman D.","year":"2007","unstructured":"D. Cederman and P. Tsigas . GPU Quicksort Library. www.cs.chalmers.se\/\u00bfdcs\/gpuqsortdcs.html , December 2007 . D. Cederman and P. Tsigas. GPU Quicksort Library. www.cs.chalmers.se\/\u00bfdcs\/gpuqsortdcs.html, December 2007."},{"key":"e_1_2_1_5_1","unstructured":"N. CUDA. www.nvidia.com\/cuda.  N. CUDA. www.nvidia.com\/cuda."},{"key":"e_1_2_1_6_1","volume-title":"A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors. Technical report","author":"Govindaraju N.","year":"2005","unstructured":"N. Govindaraju , N. Raghuvanshi , M. Henson , and D. Manocha . A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors. Technical report , University of North Carolina-Chapel Hill , 2005 . N. Govindaraju, N. Raghuvanshi, M. Henson, and D. Manocha. A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors. Technical report, University of North Carolina-Chapel Hill, 2005."},{"key":"e_1_2_1_7_1","volume-title":"Parallel Prefix Sum (Scan) with CUDA","author":"Harris M.","year":"2007","unstructured":"M. Harris , S. Sengupta , and J. D. Owens . Parallel Prefix Sum (Scan) with CUDA . In H. Nguyen, editor, GPU Gems 3. Addison Wesley , Aug. 2007 . M. Harris, S. Sengupta, and J. D. Owens. Parallel Prefix Sum (Scan) with CUDA. In H. Nguyen, editor, GPU Gems 3. Addison Wesley, Aug. 2007."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1998.1462"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/272991.272995"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-#"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1280094.1280110"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1002\/nme.1620320406"},{"key":"e_1_2_1_13_1","volume-title":"Workshop on General Purpose Processing on Graphics Processing Units","author":"Sintorn E.","year":"2007","unstructured":"E. Sintorn and U. Assarsson . Fast Parallel GPU-Sorting Using a Hybrid Algorithm . In Workshop on General Purpose Processing on Graphics Processing Units , 2007 . E. Sintorn and U. Assarsson. Fast Parallel GPU-Sorting Using a Hybrid Algorithm. In Workshop on General Purpose Processing on Graphics Processing Units, 2007."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/378580.378611"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1556444.1556447","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1556444.1556447","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:38:46Z","timestamp":1750253926000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1556444.1556447"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12,20]]},"references-count":14,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2008,12,20]]}},"alternative-id":["10.1145\/1556444.1556447"],"URL":"https:\/\/doi.org\/10.1145\/1556444.1556447","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2008,12,20]]},"assertion":[{"value":"2009-06-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}