{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T05:42:21Z","timestamp":1740980541753,"version":"3.38.0"},"reference-count":37,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2014,3,21]],"date-time":"2014-03-21T00:00:00Z","timestamp":1395360000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p> Sorting has been a topic of immense research value since the inception of computer science. Hybrid computing on multicore architectures involves computing simultaneously on a tightly coupled heterogeneous collection of devices. In this work, we consider a multicore CPU along with a manycore GPU as our experimental hybrid platform. <\/jats:p><jats:p> In this work, we present a hybrid comparison based sorting algorithm which utilizes a many-core GPU and a multi-core CPU to perform sorting. The algorithm is broadly based on splitting the input list according to a large number of splitters followed by creating independent sublists. Sorting the independent sublists results in sorting the entire original list. <\/jats:p><jats:p> On a CPU + GPU platform consisting of an Intel i7-980X and an NVidia GTX 580, our algorithm achieves a 20% gain over the current best known comparison sort result that was published by (Davidson et al., 2012). On the above experimental platform, our results are better by 40% on average over a similar GPU-alone algorithm proposed by (Leischner et al., 2010). We also extend our sorting algorithm for fixed length keys to variable length keys. We use a look-ahead based approach to sort strings and obtain around a 24% benefit compared to the current best known solution. Our results also show that our algorithm and its implementation scale with the size of the input. We also show that such performance gains can be obtained on other hybrid CPU + GPU platforms. <\/jats:p>","DOI":"10.1177\/1094342014526906","type":"journal-article","created":{"date-parts":[[2014,3,22]],"date-time":"2014-03-22T04:16:45Z","timestamp":1395461805000},"page":"267-284","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":5,"title":["Comparison sorting on hybrid multicore architectures for fixed and variable length keys"],"prefix":"10.1177","volume":"28","author":[{"given":"Dip Sankar","family":"Banerjee","sequence":"first","affiliation":[{"name":"Center For Security, Theory and Algorithmic Research, International Institute of Information Technology, Hyderabad, India"}]},{"given":"Parikshit","family":"Sakurikar","sequence":"additional","affiliation":[{"name":"Center For Visual Information Technology, International Institute of Information Technology, Hyderabad, India"}]},{"given":"Kishore","family":"Kothapalli","sequence":"additional","affiliation":[{"name":"Center For Security, Theory and Algorithmic Research, International Institute of Information Technology, Hyderabad, India"}]}],"member":"179","published-online":{"date-parts":[[2014,3,21]]},"reference":[{"key":"bibr1-1094342014526906","first-page":"1","author":"Bandyopadhyay S","year":"2010","journal-title":"Proceedings of international conference on high performance computing (HiPC), 2010"},{"key":"bibr2-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1109\/ISCC.2010.5546734"},{"key":"bibr3-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719642"},{"volume-title":"Vector Models for Data-parallel Computing","year":"1990","author":"Blelloch GE","key":"bibr4-1094342014526906"},{"key":"bibr5-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/113379.113380"},{"key":"bibr6-1094342014526906","first-page":"4:1.4","volume":"14","author":"Cederman D","year":"2010","journal-title":"Journal on Experimental Algorithmics"},{"volume-title":"Parallel Programming in OpenMP","year":"2001","author":"Chandra R","key":"bibr7-1094342014526906"},{"key":"bibr8-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1137\/0217049"},{"key":"bibr9-1094342014526906","unstructured":"Corporation NVidia (2007) Cuda: Compute Unified Device Architecture programming guide. Technical report."},{"key":"bibr10-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1109\/InPar.2012.6339592"},{"key":"bibr11-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1109\/71.532111"},{"key":"bibr12-1094342014526906","unstructured":"Frank D, Zaboli H (2010) Deterministic sample sort For GPUs. Available at: http:\/\/arxiv.org\/pdf\/1002.4464.pdf."},{"key":"bibr13-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142511"},{"key":"bibr14-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1998.1462"},{"key":"bibr15-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/366622.366644"},{"key":"bibr16-1094342014526906","first-page":"627","volume-title":"Proceedings of 7th computer software and applications conference","author":"Huang J","year":"1983"},{"key":"bibr17-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1816021"},{"key":"bibr18-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470444"},{"key":"bibr19-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/272991.272995"},{"key":"bibr20-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854344"},{"key":"bibr21-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-024X(199708)27:8<983::AID-SPE117>3.0.CO;2-#"},{"key":"bibr22-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"bibr23-1094342014526906","unstructured":"NVidea (2007) Cuda: Compute Unified Device Architecture programming guide. Technical report."},{"volume-title":"Scalable split and gather primitives for the GPU","year":"2009","author":"Patidar S","key":"bibr24-1094342014526906"},{"key":"bibr25-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1686"},{"key":"bibr26-1094342014526906","unstructured":"Podlozhnyuk V (2009) Histogram calculation in CUDA. Available at: http:\/\/developer.download.nvidia.com\/compute\/cuda\/1.1-Beta\/x86\\_64\\_website\/histogram256\/doc\/histogram.pdf."},{"key":"bibr27-1094342014526906","first-page":"41","volume-title":"Proceedings of the ACM SIGGRAPH\/EUROGRAPHICS conference on graphics hardware (HWWS \u201903)","author":"Purcell TJ","year":"2003"},{"key":"bibr28-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542311"},{"key":"bibr29-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161005"},{"key":"bibr30-1094342014526906","doi-asserted-by":"crossref","unstructured":"Satish N, Kim C, Chhugani J, (2010) Fast sort on CPUs, GPUs and Intel MIC architectures. Technical report, Intel Labs.","DOI":"10.1145\/1807167.1807207"},{"key":"bibr31-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/1230100.1230105"},{"key":"bibr32-1094342014526906","first-page":"97","volume-title":"Proceedings of the 22nd ACM SIGGRAPH\/EUROGRAPHICS symposium on graphics hardware","author":"Sengupta S","year":"2007"},{"key":"bibr33-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-74466-5_72"},{"key":"bibr34-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2008.05.012"},{"key":"bibr35-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1109\/SUPERC.1992.236714"},{"journal-title":"The 24th IEEE international parallel and distributed processing symposium (IPDPS)","year":"2010","author":"Wei Z","key":"bibr36-1094342014526906"},{"key":"bibr37-1094342014526906","doi-asserted-by":"publisher","DOI":"10.1145\/125826.126164"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014526906","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342014526906","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014526906","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T13:58:43Z","timestamp":1740923923000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342014526906"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3,21]]},"references-count":37,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.1177\/1094342014526906"],"URL":"https:\/\/doi.org\/10.1177\/1094342014526906","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2014,3,21]]}}}