{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:08:35Z","timestamp":1750306115014,"version":"3.41.0"},"reference-count":12,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,1,11]],"date-time":"2017-01-11T00:00:00Z","timestamp":1484092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2017,1,11]]},"abstract":"<jats:p>High-performance sorting is used in various areas such as database transactions and genomic feature operations. To improve sorting performance, in addition to the conventional approach of using general purpose processors or GPUs, the approach of using FPGAs is becoming a promising solution. As an FPGA sorting accelerator, Casper and Olukotun have recently proposed the fastest one known so far. In their study, they proposed a merge network which can merge two sorted data series at a throughput of 6 data elements per 200MHz clock cycle.<\/jats:p>\n          <jats:p>If an FPGA sorting accelerator is constructed using merge networks, the overall throughput will be mainly determined by the throughputs of the merge networks. This motivates us to design a merge network which outputs more than 6 data elements per 200MHz clock cycle. In this paper, we propose a cost-effective and high-throughput merge network for the fastest FPGA sorting accelerator. The evaluation shows that our proposal achieves a throughput of 8 data elements per 200MHz clock cycle.<\/jats:p>","DOI":"10.1145\/3039902.3039905","type":"journal-article","created":{"date-parts":[[2017,1,17]],"date-time":"2017-01-17T13:42:08Z","timestamp":1484660528000},"page":"8-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Cost-Effective and High-Throughput Merge Network"],"prefix":"10.1145","volume":"44","author":[{"given":"Susumu","family":"Mashimo","sequence":"first","affiliation":[{"name":"Tokyo Institute of Technology, Japan"}]},{"given":"Thiem","family":"Van Chu","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Japan"}]},{"given":"Kenji","family":"Kise","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Japan"}]}],"member":"320","published-online":{"date-parts":[[2017,1,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132960.1132964"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bts277"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2554688.2554787"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.342136"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1468075.1468121"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/2809974.2809988"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824050"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807207"},{"key":"e_1_2_1_9_1","volume-title":"Implicit radix sorting on gpus","author":"Ha J.","year":"2010","unstructured":"J. Ha , Jens Kr\u00fcger , and Cl\u00e1udio T. Silva . Implicit radix sorting on gpus . 2010 . J. Ha, Jens Kr\u00fcger, and Cl\u00e1udio T. Silva. Implicit radix sorting on gpus. 2010."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2014.03.003"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950413.1950427"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2016.34"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3039902.3039905","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3039902.3039905","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:36:31Z","timestamp":1750217791000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3039902.3039905"}},"subtitle":["Architecture for the Fastest FPGA Sorting Accelerator"],"short-title":[],"issued":{"date-parts":[[2017,1,11]]},"references-count":12,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,1,11]]}},"alternative-id":["10.1145\/3039902.3039905"],"URL":"https:\/\/doi.org\/10.1145\/3039902.3039905","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2017,1,11]]},"assertion":[{"value":"2017-01-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}