{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T22:42:20Z","timestamp":1774219340408,"version":"3.50.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,1,28]],"date-time":"2022-01-28T00:00:00Z","timestamp":1643328000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"CRISP, one of six centers in JUMP, an SRC program sponsored by DARPA"},{"name":"SRC Global Research Collaboration (GRC) grant"},{"name":"NSF","award":["1730158, 1911095, 2003279, and 1826967"],"award-info":[{"award-number":["1730158, 1911095, 2003279, and 1826967"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>As the size of data generated every day grows dramatically, the computational bottleneck of computer systems has shifted toward storage devices. The interface between the storage and the computational platforms has become the main limitation due to its limited bandwidth, which does not scale when the number of storage devices increases. Interconnect networks do not provide simultaneous access to all storage devices and thus limit the performance of the system when executing independent operations on different storage devices. Offloading the computations to the storage devices eliminates the burden of data transfer from the interconnects. Near-storage computing offloads a portion of computations to the storage devices to accelerate big data applications. In this article, we propose a generic near-storage sort accelerator for data analytics, NASCENT2, which utilizes Samsung SmartSSD, an NVMe flash drive with an on-board FPGA chip that processes data in situ.<\/jats:p>\n          <jats:p>NASCENT2 consists of dictionary decoder, sort, and shuffle FPGA-based accelerators to support sorting database tables based on a key column with any arbitrary data type. It exploits data partitioning applied by data processing management systems, such as SparkSQL, to breakdown the sort operations on colossal tables to multiple sort operations on smaller tables. NASCENT2 generic sort provides 2 \u00d7 speedup and 15.2 \u00d7 energy efficiency improvement as compared to the CPU baseline. It moreover considers the specifications of the SmartSSD (e.g., the FPGA resources, interconnect network, and solid-state drive bandwidth) to increase the scalability of computer systems as the number of storage devices increases. With 12 SmartSSDs, NASCENT2 is 9.9\u00d7 (137.2 \u00d7) faster and 7.3 \u00d7 (119.2 \u00d7) more energy efficient in sorting the largest tables of TPCC and TPCH benchmarks than the FPGA (CPU) baseline.<\/jats:p>","DOI":"10.1145\/3472769","type":"journal-article","created":{"date-parts":[[2022,1,29]],"date-time":"2022-01-29T06:14:48Z","timestamp":1643436888000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":23,"title":["NASCENT2: Generic Near-Storage Sort Accelerator for Data Analytics on SmartSSD"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3022-8212","authenticated-orcid":false,"given":"Sahand","family":"Salamat","sequence":"first","affiliation":[{"name":"UC San Diego, La Jolla, CA"}]},{"given":"Hui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Samsung Semiconductor Inc., San Jose, CA"}]},{"given":"Yang Seok","family":"Ki","sequence":"additional","affiliation":[{"name":"Samsung Semiconductor Inc., San Jose, CA"}]},{"given":"Tajana","family":"Rosing","sequence":"additional","affiliation":[{"name":"UC San Diego, La Jolla, CA"}]}],"member":"320","published-online":{"date-parts":[[2022,1,28]]},"reference":[{"key":"e_1_3_2_2_2","volume-title":"Database Management Systems","author":"Ramakrishnan Raghu","year":"2003","unstructured":"Raghu Ramakrishnan, Johannes Gehrke, and Johannes Gehrke. 2003. Database Management Systems. Vol. 3. McGraw-Hill, New York, NY."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-0617-4_29"},{"key":"e_1_3_2_4_2","volume-title":"Gullfoss: Accelerating and Simplifying Data Movement Among Heterogeneous Computing and Storage Resources","author":"Tseng Hung-Wei","year":"2015","unstructured":"Hung-Wei Tseng, Yang Liu, Mark Gahagan, Jing Li, Yanqin Jing, and Steven J. Swanson. 2015. Gullfoss: Accelerating and Simplifying Data Movement Among Heterogeneous Computing and Storage Resources. Department of Computer Science and Engineering, University of California."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD45719.2019.8942135"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3124553"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/BIGCOM.2019.00024"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230560"},{"key":"e_1_3_2_9_2","first-page":"379","volume-title":"Proceedings of the 2019 Annual Technical Conference (USENIX ATC\u201919)","author":"Ruan Zhenyuan","year":"2019","unstructured":"Zhenyuan Ruan, Tong He, and Jason Cong. 2019. {INSIDER}: Designing in-storage computing system for emerging high-performance drive. In Proceedings of the 2019 Annual Technical Conference (USENIX ATC\u201919). 379\u2013394."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/384265.291026"},{"key":"e_1_3_2_11_2","unstructured":"https:\/\/samsungsemiconductor-us.com\/smartssd\/. Samsung. n.d. SmartSSD"},{"key":"e_1_3_2_12_2","unstructured":"http:\/\/www.scaleflux.com\/ ScaleFlux"},{"key":"e_1_3_2_13_2","volume-title":"White Paper: Smarter Data Storage\u2014A Guide to Computational Storage on ARM","year":"2019","unstructured":"ARM. 2019. White Paper: Smarter Data Storage\u2014A Guide to Computational Storage on ARM. Technical Report. ARM."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2933349.2933353"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2019.2929288"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3415580"},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of 10th International Workshop on Accelerating Analytics and Data Management Systems (ADMS\u201919)","author":"Chapman Keith","year":"2019","unstructured":"Keith Chapman, Mehdi Nik, Behnam Robatmili, Shahrzad Mirkhani, and Maysam Lavasani. 2019. Computational storage for big data analytics. In Proceedings of 10th International Workshop on Accelerating Analytics and Data Management Systems (ADMS\u201919)."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994512"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465295"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.14778\/2732967.2732972"},{"key":"e_1_3_2_21_2","first-page":"60","article-title":"Networking and storage: The next computing elements in exascale systems?","volume":"43","author":"Lerner Alberto","year":"2020","unstructured":"Alberto Lerner, Rana Hussein, Andr\u00e9 Ryser, Sangjin Lee, and Philippe Cudr\u00e9-Mauroux. 2020. Networking and storage: The next computing elements in exascale systems?IEEE Data Engineering Bulletin 43 (2020), 60\u201371.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.839320"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/lca.2020.3009347"},{"key":"e_1_3_2_24_2","article-title":"Workload-aware opportunistic energy efficiency in multi-FPGA platforms","author":"Salamat Sahand","year":"2019","unstructured":"Sahand Salamat, Behnam Khaleghi, Mohsen Imani, and Tajana Rosing. 2019. Workload-aware opportunistic energy efficiency in multi-FPGA platforms. arXiv preprint arXiv:1908.06519 (2019).","journal-title":"arXiv preprint arXiv:1908.06519"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-018-3761-1"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394885.3431541"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.2992662"},{"key":"e_1_3_2_28_2","unstructured":"Sahand Salamat Hui Zhang Joo Hwan Lee and Yang Seok Ki. 2021. System and method for hierarchical sort acceleration near storage. US Patent App. 16\/821 811. April 29 2021."},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1145\/3358960.3375794","volume-title":"Proceedings of the ACM\/SPEC International Conference on Performance Engineering","author":"Reis Veronica Lagrange Moutinho dos","year":"2020","unstructured":"Veronica Lagrange Moutinho dos Reis, Harry Li, and Anahita Shayesteh. 2020. Modeling analytics for computational storage. In Proceedings of the ACM\/SPEC International Conference on Performance Engineering. 88\u201399."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.14778\/3007328.3007331"},{"key":"e_1_3_2_31_2","first-page":"1","article-title":"IBM PureData system for analytics architecture","author":"Francisco Phil","year":"2014","unstructured":"Phil Francisco. 2014. IBM PureData system for analytics architecture. IBM Redbooks 2014, 1\u201316.","journal-title":"IBM Redbooks"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/1132960.1132964"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP49362.2020.00031"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00111"},{"key":"e_1_3_2_35_2","unstructured":"Ionut Boicu. 2019. Adaptive On-the-Fly Compressed Execution in Spark . Master\u2019s Thesis. Vrije Universiteit Amsterdam."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439298"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137776"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/2898996"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001154"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3310149"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465003"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2595566"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0265-5"},{"key":"e_1_3_2_45_2","volume-title":"Proceedings of the 1st Workshop on Near-Data Processing","author":"Cho Benjamin Y.","year":"2013","unstructured":"Benjamin Y. Cho, Won Seob Jeong, Doohwan Oh, and Won Woo Ro. 2013. XSD: Accelerating MapReduce by harnessing the GPU inside an SSD. In Proceedings of the 1st Workshop on Near-Data Processing."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2013.6645619"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICECTE.2016.7879576"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.21609\/jiki.v9i2.378"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-09661-2_2"},{"key":"e_1_3_2_50_2","first-page":"80080E","volume-title":"Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2011","author":"Zabo\u0142otny Wojciech M.","year":"2011","unstructured":"Wojciech M. Zabo\u0142otny. 2011. Dual port memory based heapsort implementation for FPGA. In Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2011, Vol. 8008. International Society for Optics and Photonics, 80080E."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM48280.2020.00055"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/FIT.2014.48"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00033"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2016.117"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3399666.3399897"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/1950413.1950427"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-011-0232-z"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCI.2013.6612389"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689068"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSC.2018.8585361"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375304"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.53"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40450-4_15"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/1468075.1468121"},{"key":"e_1_3_2_65_2","unstructured":"Retrieved January 21 2021 from http:\/\/www.tpc.org\/tpch\/. TPC. n.d. TPC-H"},{"key":"e_1_3_2_66_2","unstructured":"C++ Libraries. Home Page. Retrieved April 2 2021 from https:\/\/www.boost.org\/."},{"key":"e_1_3_2_67_2","unstructured":"http:\/\/www.tpc.org\/tpcc\/. TPC. n.d. TPCC Benchmark"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472769","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472769","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472769","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:09Z","timestamp":1750191429000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472769"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,28]]},"references-count":66,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3472769"],"URL":"https:\/\/doi.org\/10.1145\/3472769","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,28]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}