{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:48:15Z","timestamp":1750308495216,"version":"3.41.0"},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T00:00:00Z","timestamp":1742515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61521003"],"award-info":[{"award-number":["61521003"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>\n            Fast sorting of large-scale data is an essential task for data centers. In previous works, the existing computational model of sorting kernel still results in lower bandwidth utilization on the external memory bus. And the execution of merge operations in merge sort circuit on FPGAs depends on control commands from the host CPU. In this case, the merge sort circuit is not fully offloaded to hardware layer for acceleration, resulting in a performance loss. We design an on-chip merge sort controller to efficiently command the merge sort process. The proposed controller has the ability to schedule multiple on-chip computing kernels simultaneously in a more efficient mode, thus ensuring that the circuit has a better bandwidth utilization. Meanwhile, fundamental factors affecting the performance of merge sort are studied and analyzed, and we propose a high-performance merge sort architecture. Results show that using the proposed controller-centered architecture, an overall improvement of 20%\u201330% in sorting throughput can be achieved. Compared with the state-of-the-art previous merge sorting implementation on FPGA, our circuit can achieve 1.22\/1.46\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            speedup.\n          <\/jats:p>","DOI":"10.1145\/3716392","type":"journal-article","created":{"date-parts":[[2025,2,5]],"date-time":"2025-02-05T16:54:51Z","timestamp":1738774491000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["FPGA-Based Large-Scale Sorting with Optimized Bandwidth Utilization"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4304-2836","authenticated-orcid":false,"given":"Mingqian","family":"Sun","sequence":"first","affiliation":[{"name":"School of Cyber Science and Engineering, Southeast University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6543-8173","authenticated-orcid":false,"given":"Guangwei","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7456-8377","authenticated-orcid":false,"given":"Fan","family":"Zhang","sequence":"additional","affiliation":[{"name":"National Digital Switching System Engineering and Technological Research Center, Zhengzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1023-7277","authenticated-orcid":false,"given":"Wei","family":"Guo","sequence":"additional","affiliation":[{"name":"National Digital Switching System Engineering and Technological Research Center, Zhengzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8698-6584","authenticated-orcid":false,"given":"Xitian","family":"Fan","sequence":"additional","affiliation":[{"name":"Hongzhen Information Technology Co., Ltd, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8206-5255","authenticated-orcid":false,"given":"Li","family":"Chen","sequence":"additional","affiliation":[{"name":"National Digital Switching System Engineering and Technological Research Center, Zhengzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-8287-709X","authenticated-orcid":false,"given":"Jiayu","family":"Du","sequence":"additional","affiliation":[{"name":"Purple Mountain Laboratories, Nanjing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,3,21]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Xilinx Advanced Micro Devices Inc. 2024. AXI High Bandwidth Memory Controller LogiCORE IP Product Guide (PG276). Retrieved from https:\/\/docs.xilinx.com\/r\/en-US\/pg276-axi-hbm"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/1468075.1468121"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3506713"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00067"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375304"},{"issue":"8","key":"e_1_3_2_7_2","first-page":"1449","article-title":"A hybrid pipelined architecture for high performance TOP-K sorting on FPGA","volume":"67","author":"Chen Weijie","year":"2019","unstructured":"Weijie Chen, Weijun Li, and Feng Yu. 2019. A hybrid pipelined architecture for high performance TOP-K sorting on FPGA. IEEE Transactions on Circuits and Systems II: Express Briefs 67, 8 (2019), 1449\u20131453.","journal-title":"IEEE Transactions on Circuits and Systems II: Express Briefs"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824050"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCC51575.2020.9345012"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2012.108"},{"key":"e_1_3_2_11_2","unstructured":"Phillip Griffith. 2024. AMR5. Retrieved from https:\/\/sortbenchmark.org\/AMR5Sort2023.pdf"},{"key":"e_1_3_2_12_2","unstructured":"Yonathan Mendelev Igor Mendelev and Levi Mendelev. 2023. MendSort. Retrieved from https:\/\/sortbenchmark.org\/MendSort2023.pdf"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.53"},{"key":"e_1_3_2_14_2","first-page":"559","article-title":"Sorting and Searching","volume":"422","author":"Knuth Donald E.","year":"1973","unstructured":"Donald E. Knuth. 1973. Sorting and Searching. The Art of Computer Programming 422 (1973), 559\u2013563.","journal-title":"The Art of Computer Programming"},{"key":"e_1_3_2_15_2","unstructured":"Chris Nyberg. 2024. Sort Benchmark. Retrieved from http:\/\/www.ordinal.com\/gensort.html"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2018.00022"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL50879.2020.00021"},{"issue":"12","key":"e_1_3_2_18_2","first-page":"3215","article-title":"FLiMS: A fast lightweight 2-way merger for sorting","volume":"71","author":"Papaphilippou Philippos","year":"2022","unstructured":"Philippos Papaphilippou, Wayne Luk, and Chris Brooks. 2022. FLiMS: A fast lightweight 2-way merger for sorting. IEEE Transactions on Computers 71, 12 (2022), 3215\u20133226.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM51124.2021.00020"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2018.00038"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439298"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00033"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2016.34"},{"key":"e_1_3_2_24_2","unstructured":"Xilinx. 2024. UltraScale Architecture-Based FPGAs Memory IP LogiCORE IP Product Guide (PG150). Retrieved from https:\/\/docs.amd.com\/r\/en-US\/pg150-ultrascale-memory-ip"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716392","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716392","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:43Z","timestamp":1750272223000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716392"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,21]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3716392"],"URL":"https:\/\/doi.org\/10.1145\/3716392","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2025,3,21]]},"assertion":[{"value":"2024-07-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}