{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T03:07:41Z","timestamp":1780456061156,"version":"3.54.1"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","license":[{"start":{"date-parts":[[2009,12,1]],"date-time":"2009-12-01T00:00:00Z","timestamp":1259625600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM J. Exp. Algorithmics"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:p>In this article, we describe GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multicore graphics processors. Quicksort has previously been considered an inefficient sorting solution for graphics processors, but we show that in CUDA, NVIDIA's programing platform for general-purpose computations on graphical processors, GPU-Quicksort performs better than the fastest-known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.<\/jats:p>","DOI":"10.1145\/1498698.1564500","type":"journal-article","created":{"date-parts":[[2010,4,7]],"date-time":"2010-04-07T02:56:32Z","timestamp":1270608992000},"source":"Crossref","is-referenced-by-count":44,"title":["GPU-Quicksort"],"prefix":"10.1145","volume":"14","author":[{"given":"Daniel","family":"Cederman","sequence":"first","affiliation":[{"name":"Chalmers University of Technology, G\u00f6teborg, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Philippas","family":"Tsigas","sequence":"additional","affiliation":[{"name":"Chalmers University of Technology, G\u00f6teborg, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2010,1,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1137\/0218014"},{"key":"e_1_2_1_2_1","volume-title":"Synthesis of Parallel Algorithms","author":"Blelloch G. E.","unstructured":"Blelloch , G. E. 1993. Prefix sums and their applications . In Synthesis of Parallel Algorithms , J. H. Reif, Ed. Morgan Kaufmann , San Francisco . Blelloch, G. E. 1993. Prefix sums and their applications. In Synthesis of Parallel Algorithms, J. H. Reif, Ed. Morgan Kaufmann, San Francisco."},{"key":"e_1_2_1_3_1","unstructured":"Cederman D. and Tsigas P. 2007. GPU Quicksort Library. http:\/\/www.cs.chalmers.se\/~dcs\/gpuqsortdcs.html.  Cederman D. and Tsigas P. 2007. GPU Quicksort Library. http:\/\/www.cs.chalmers.se\/~dcs\/gpuqsortdcs.html."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/76359.76362"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1080\/00207168208803323"},{"key":"e_1_2_1_6_1","unstructured":"Govindaraju N. Raghuvanshi N. Henson M. and Manocha D. 2005. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Tech. rep. Univ. of North Carolina-Chapel Hill.  Govindaraju N. Raghuvanshi N. Henson M. and Manocha D. 2005. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Tech. rep. Univ. of North Carolina-Chapel Hill."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142511"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066227"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. IEEE, Los Alamitos.","author":"Gress A.","unstructured":"Gress , A. and Zachmann , G . 2006. GPU-ABiSort: Optimal parallel sorting on stream architectures . In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. IEEE, Los Alamitos. Gress, A. and Zachmann, G. 2006. GPU-ABiSort: Optimal parallel sorting on stream architectures. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. IEEE, Los Alamitos."},{"key":"e_1_2_1_10_1","unstructured":"Harris M. Sengupta S. and Owens J. D. 2007. Parallel prefix sum (scan) with CUDA. In GPU Gems 3 H. Nguyen Ed. Addison Wesley Upper Saddle River.  Harris M. Sengupta S. and Owens J. D. 2007. Parallel prefix sum (scan) with CUDA. In GPU Gems 3 H. Nguyen Ed. Addison Wesley Upper Saddle River."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.46289"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1998.1462"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/366622.366644"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/5.1.10"},{"key":"e_1_2_1_15_1","volume-title":"Introduction to Parallel Algorithms","author":"Jaja J.","unstructured":"Jaja , J. 1992. Introduction to Parallel Algorithms . Addison-Wesley , Upper Saddle River. Jaja, J. 1992. Introduction to Parallel Algorithms. Addison-Wesley, Upper Saddle River."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360145"},{"key":"e_1_2_1_17_1","unstructured":"Khronos Group. 2008. OpenCL (Open Computing Language). http:\/\/www.khronos.org\/opencl\/.  Khronos Group. 2008. OpenCL (Open Computing Language). http:\/\/www.khronos.org\/opencl\/."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1058129.1058146"},{"key":"e_1_2_1_19_1","unstructured":"Kipfer P. and Westermann R. 2005. Improved GPU sorting. In GPUGems 2 M. Pharr Ed. Addison-Wesley Upper Saddle River 733--746.  Kipfer P. and Westermann R. 2005. Improved GPU sorting. In GPUGems 2 M. Pharr Ed. Addison-Wesley Upper Saddle River 733--746."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/272991.272995"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/261387.261395"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the ACM SIGGRAPH\/Eurographics Symposium on Graphics Hardware. ACM","author":"Purcell T. J.","unstructured":"Purcell , T. J. , Donner , C. , Cammarano , M. , Jensen , H. W. , and Hanrahan , P . 2003. Photon mapping on programmable graphics hardware . In Proceedings of the ACM SIGGRAPH\/Eurographics Symposium on Graphics Hardware. ACM , New York, 41--50. Purcell, T. J., Donner, C., Cammarano, M., Jensen, H. W., and Hanrahan, P. 2003. Photon mapping on programmable graphics hardware. In Proceedings of the ACM SIGGRAPH\/Eurographics Symposium on Graphics Hardware. ACM, New York, 41--50."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/359619.359631"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 22nd ACM SIGGRAPH\/EUROGRAPHICS Symposium on Graphics Hardware. ACM","author":"Sengupta S.","unstructured":"Sengupta , S. , Harris , M. , Zhang , Y. , and Owens , J. D . 2007. Scan primitives for GPU computing . In Proceedings of the 22nd ACM SIGGRAPH\/EUROGRAPHICS Symposium on Graphics Hardware. ACM , New York, 97--106. Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for GPU computing. In Proceedings of the 22nd ACM SIGGRAPH\/EUROGRAPHICS Symposium on Graphics Hardware. ACM, New York, 97--106."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/362875.362901"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units. ACM","author":"Sintorn E.","unstructured":"Sintorn , E. and Assarsson , U . 2007. Fast parallel GPU-sorting using a hybrid algorithm . In Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units. ACM , New York. Sintorn, E. and Assarsson, U. 2007. Fast parallel GPU-sorting using a hybrid algorithm. In Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units. ACM, New York."},{"key":"e_1_2_1_27_1","unstructured":"Stanford. 2008. The Stanford 3D scanning repository. http:\/\/www.graphics.stanford.edu\/data\/3Dscanrep.  Stanford. 2008. The Stanford 3D scanning repository. http:\/\/www.graphics.stanford.edu\/data\/3Dscanrep."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 11th Euromicro- Conference on Parallel Distributed and Network-based Processing. IEEE, Los Alamitos, 372--381","author":"Tsigas P.","unstructured":"Tsigas , P. and Zhang , Y . 2003. A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000 . In Proceedings of the 11th Euromicro- Conference on Parallel Distributed and Network-based Processing. IEEE, Los Alamitos, 372--381 . Tsigas, P. and Zhang, Y. 2003. A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000. In Proceedings of the 11th Euromicro- Conference on Parallel Distributed and Network-based Processing. IEEE, Los Alamitos, 372--381."}],"container-title":["ACM Journal of Experimental Algorithmics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1498698.1564500","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1498698.1564500","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:45:43Z","timestamp":1750250743000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1498698.1564500"}},"subtitle":["A practical Quicksort algorithm for graphics processors"],"short-title":[],"issued":{"date-parts":[[2009,12]]},"references-count":28,"alternative-id":["10.1145\/1498698.1564500"],"URL":"https:\/\/doi.org\/10.1145\/1498698.1564500","relation":{},"ISSN":["1084-6654","1084-6654"],"issn-type":[{"value":"1084-6654","type":"print"},{"value":"1084-6654","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,12]]}}}