{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T22:07:43Z","timestamp":1766182063211,"version":"3.41.0"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T00:00:00Z","timestamp":1699488000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61832011"],"award-info":[{"award-number":["61832011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Open Research Program of Zhejiang Lab","award":["2020KC0AB03"],"award-info":[{"award-number":["2020KC0AB03"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,11,30]]},"abstract":"<jats:p>We present the design and implementation of TH-iSSD, a near-data processing framework to address the data movement problem. TH-iSSD does not pose any restriction to the hardware selection and is highly reconfigurable\u2014its core components, such as the on-device compute unit (e.g., FPGA, embedded CPUs) and data collectors (e.g., camera, sensors), can be easily replaced to adapt to different use cases. TH-iSSD achieves this goal by incorporating highly flexible computation and data paths. In the data path, TH-iSSD adopts an efficient device-level data switch that exchanges data with both host CPUs and peripheral sensors; it also enables direct accesses between the sensing, computation, and storage hardware components, which completely eliminates the redundant data movement overhead, and thus delivers both high performance and energy efficiency. In the computation path, TH-iSSD provides an abstraction of<jats:italic>filestream<\/jats:italic>for developers, which abstracts a collection of data along with the related computation task as a file. Since existing applications are familiar with POSIX-like interfaces, they can be ported on top of our platform with minimal code modification. Moreover, TH-iSSD also introduces mechanisms including<jats:italic>pipelined near-data processing<\/jats:italic>and<jats:italic>priority-aware I\/O scheduling<\/jats:italic>to make TH-iSSD perform more effectively. We deploy TH-iSSD to accelerate two types of applications: the content-based information retrieval system and the edge zero-streaming system. Our experimental results show that TH-iSSD achieves up to 1.6\u00d7 higher throughput and 36% lower latency than<jats:italic>compute-centric<\/jats:italic>designs.<\/jats:p>","DOI":"10.1145\/3563456","type":"journal-article","created":{"date-parts":[[2023,2,20]],"date-time":"2023-02-20T11:47:40Z","timestamp":1676893660000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["TH-iSSD: Design and Implementation of a Generic and Reconfigurable Near-Data Processing Framework"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7362-2789","authenticated-orcid":false,"given":"Jiwu","family":"Shu","sequence":"first","affiliation":[{"name":"Tsinghua University"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2387-4160","authenticated-orcid":false,"given":"Kedong","family":"Fang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4171-4299","authenticated-orcid":false,"given":"Youmin","family":"Chen","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0924-4477","authenticated-orcid":false,"given":"Shuo","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]}],"member":"320","published-online":{"date-parts":[[2023,11,9]]},"reference":[{"unstructured":"Intel. (n.d.). Intel Optane SSD DC P4800X. Retrieved February 27 2023 from https:\/\/www.intel.com\/content\/www\/us\/en\/solid-state-drives\/optane-ssd-dc-p4800x-brief.html.","key":"e_1_3_1_2_2"},{"unstructured":"Samsung. (n.d.) Samsung NVMe SSD 960 Pro. Retrieved February 27 2023 from https:\/\/www.samsung.com\/us\/computing\/memory-storage\/solid-state-drives\/ssd-960-pro-m-2-512gb-mz-v6p512bw\/.","key":"e_1_3_1_3_2"},{"unstructured":"Google Cloud. (n.d.) Cloud TPU. Retrieved February 27 2023 from https:\/\/cloud.google.com\/tpu.","key":"e_1_3_1_4_2"},{"issue":"5","key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1145\/384265.291026","article-title":"Active disks: Programming model, algorithms and evaluation","volume":"32","author":"Acharya Anurag","year":"1998","unstructured":"Anurag Acharya, Mustafa Uysal, and Joel Saltz. 1998. Active disks: Programming model, algorithms and evaluation. ACM SIGOPS Operating Systems Review 32, 5 (1998), 81\u201391.","journal-title":"ACM SIGOPS Operating Systems Review"},{"key":"e_1_3_1_6_2","first-page":"T174\u2013T175","volume-title":"Proceedings of the 2017 Symposium on VLSI Technology","author":"Agarwal Sapan","year":"2017","unstructured":"Sapan Agarwal, Robin B. Jacobs Gedrim, Alexander H. Hsia, David R. Hughart, Elliot J. Fuller, A. Alec Talin, Conrad D. James, Steven J. Plimpton, and Matthew J. Marinella. 2017. Achieving ideal accuracies in analog neuromorphic computing using periodic carry. In Proceedings of the 2017 Symposium on VLSI Technology. IEEE, Los Alamitos, CA, T174\u2013T175."},{"doi-asserted-by":"publisher","key":"e_1_3_1_7_2","DOI":"10.1109\/JPROC.2010.2070830"},{"key":"e_1_3_1_8_2","first-page":"1","volume-title":"Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles","author":"Andersen David G.","year":"2009","unstructured":"David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. 1\u201314."},{"doi-asserted-by":"publisher","key":"e_1_3_1_9_2","DOI":"10.1109\/MM.2014.55"},{"issue":"01","key":"e_1_3_1_10_2","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/MM.2016.1","article-title":"Near-data processing [Guest editors\u2019 introduction]","volume":"36","author":"Balasubramonian Rajeev","year":"2016","unstructured":"Rajeev Balasubramonian and Boris Grot. 2016. Near-data processing [Guest editors\u2019 introduction]. IEEE Micro 36, 01 (2016), 4\u20135.","journal-title":"IEEE Micro"},{"key":"e_1_3_1_11_2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1145\/3102980.3102990","volume-title":"Proceedings of the 16th Workshop on Hot Topics in Operating Systems","author":"Barbalace Antonio","year":"2017","unstructured":"Antonio Barbalace, Anthony Iliopoulos, Holm Rauchfuss, and Goetz Brasche. 2017. It\u2019s time to think about an operating system for near data processing architectures. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems. 56\u201361."},{"issue":"11","key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"3498","DOI":"10.1109\/TED.2015.2439635","article-title":"Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element","volume":"62","author":"Burr Geoffrey W.","year":"2015","unstructured":"Geoffrey W. Burr, Robert M. Shelby, Severin Sidler, Carmelo Di Nolfo, Junwoo Jang, Irem Boybat, Rohit S. Shenoy, et\u00a0al. 2015. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. IEEE Transactions on Electron Devices 62, 11 (2015), 3498\u20133507.","journal-title":"IEEE Transactions on Electron Devices"},{"key":"e_1_3_1_13_2","first-page":"335","volume-title":"Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI\u201906)","author":"Burrows Mike","year":"2006","unstructured":"Mike Burrows. 2006. The chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI\u201906). 335\u2013350."},{"doi-asserted-by":"publisher","key":"e_1_3_1_14_2","DOI":"10.1145\/2189750.2151017"},{"issue":"1","key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1145\/2654822.2541967","article-title":"DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning","volume":"42","author":"Chen Tianshi","year":"2014","unstructured":"Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News 42, 1 (2014), 269\u2013284.","journal-title":"ACM SIGARCH Computer Architecture News"},{"key":"e_1_3_1_16_2","first-page":"91","volume-title":"Proceedings of the 27th International ACM Conference on Supercomputing","author":"Cho Sangyeun","year":"2013","unstructured":"Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active disk meets flash: A case for intelligent SSDs. In Proceedings of the 27th International ACM Conference on Supercomputing. 91\u2013102."},{"key":"e_1_3_1_17_2","article-title":"Near-data processing for differentiable machine learning models","author":"Choe Hyeokjun","year":"2016","unstructured":"Hyeokjun Choe, Seil Lee, Hyunha Nam, Seongsik Park, Seijoon Kim, Eui-Young Chung, and Sungroh Yoon. 2016. Near-data processing for differentiable machine learning models. arXiv preprint arXiv:1610.02273 (2016).","journal-title":"arXiv preprint arXiv:1610.02273"},{"doi-asserted-by":"crossref","unstructured":"Jaeyoung Do Yang-Suk Kee Jignesh M. Patel Chanik Park Kwanghyun Park and David J. DeWitt. 2013. Query processing on smart SSDs: Opportunities and challenges. InProceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201913). ACM New York NY 1221\u20131230. DOI:10.1145\/2463676.2465295","key":"e_1_3_1_18_2","DOI":"10.1145\/2463676.2465295"},{"key":"e_1_3_1_19_2","first-page":"85","volume-title":"Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution","author":"Felix Evan J.","year":"2006","unstructured":"Evan J. Felix, Kevin Fox, Kevin Regimbal, and Jarek Nieplocha. 2006. Active storage processing in a parallel file system. In Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution. 85."},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1145\/945445.945450","volume-title":"Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP\u201903)","author":"Ghemawat Sanjay","year":"2003","unstructured":"Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP\u201903). ACM, New York, NY, 29\u201343. DOI:10.1145\/945445.945450"},{"key":"e_1_3_1_21_2","first-page":"153","volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916)","author":"Gu Boncheol","year":"2016","unstructured":"Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, et\u00a0al. 2016. Biscuit: A framework for near-data processing of big data workloads. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916). IEEE, Los Alamitos, CA, 153\u2013165. DOI:10.1109\/ISCA.2016.23"},{"key":"e_1_3_1_22_2","first-page":"770","volume-title":"Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 770\u2013778. DOI:10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_23_2","first-page":"1","volume-title":"Proceedings of the 2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA\u201915)","author":"Jun Sang-Woo","year":"2015","unstructured":"Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, et\u00a0al. 2015. BlueDBM: An appliance for big data analytics. In Proceedings of the 2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA\u201915). IEEE, Los Alamitos, CA, 1\u201313."},{"key":"e_1_3_1_24_2","first-page":"411","volume-title":"Proceedings of the 45th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201918)","author":"Jun Sang Woo","year":"2018","unstructured":"Sang Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Using accelerated flash storage for external graph analytics. In Proceedings of the 45th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201918). IEEE, Los Alamitos, CA, 411\u2013424. DOI:10.1109\/ISCA.2018.00042"},{"key":"e_1_3_1_25_2","first-page":"1","volume-title":"Proceedings of the 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST\u201913)","author":"Kang Yangwook","year":"2013","unstructured":"Yangwook Kang, Yang-Suk Kee, Ethan L. Miller, and Chanik Park. 2013. Enabling cost-effective data processing with smart SSD. In Proceedings of the 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST\u201913). IEEE, Los Alamitos, CA, 1\u201312."},{"issue":"3","key":"e_1_3_1_26_2","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1145\/290593.290602","article-title":"A case for intelligent disks (IDISKs)","volume":"27","author":"Keeton Kimberly","year":"1998","unstructured":"Kimberly Keeton, David A. Patterson, and Joseph M. Hellerstein. 1998. A case for intelligent disks (IDISKs). ACM SIGMOD Record 27, 3 (1998), 42\u201352.","journal-title":"ACM SIGMOD Record"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1109\/IISWC.2013.6704670","volume-title":"Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC\u201913)","author":"Kestor Gokcen","year":"2013","unstructured":"Gokcen Kestor, Roberto Gioiosa, Darren J. Kerbyson, and Adolfy Hoisie. 2013. Quantifying the energy cost of data movement in scientific applications. In Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC\u201913). IEEE, Los Alamitos, CA, 56\u201365."},{"key":"e_1_3_1_28_2","volume-title":"Proceedings of the International Workshop on Accelerating Data Management Systems (ADMS\u201911)","author":"Kim Sungchan","year":"2011","unstructured":"Sungchan Kim, Hyunok Oh, Chanik Park, Sangyeun Cho, and Sang-Won Lee. 2011. Fast, energy efficient scan inside flash memory SSDs. In Proceedings of the International Workshop on Accelerating Data Management Systems (ADMS\u201911)."},{"key":"e_1_3_1_29_2","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.ins.2015.07.056","article-title":"In-storage processing of database scans and joins","volume":"327","author":"Kim Sungchan","year":"2016","unstructured":"Sungchan Kim, Hyunok Oh, Chanik Park, Sangyeun Cho, Sang-Won Lee, and Bongki Moon. 2016. In-storage processing of database scans and joins. Information Sciences 327 (2016), 183\u2013200.","journal-title":"Information Sciences"},{"key":"e_1_3_1_30_2","first-page":"219","volume-title":"Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50\u201917)","author":"Koo Gunjae","year":"2017","unstructured":"Gunjae Koo, Kiran Kumar Matam, Te I, H. V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading communication with computing near storage. In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50\u201917). ACM, New York, NY, 219\u2013231. DOI:10.1145\/3123939.3124553"},{"issue":"2","key":"e_1_3_1_31_2","article-title":"Eusocial storage devices: Offloading data management to storage devices that can act collectively","volume":"43","author":"Kufeldt Philip","year":"2018","unstructured":"Philip Kufeldt, Carlos Maltzahn, Tim Feldman, Christine Green, Grant Mackey, and Shingo Tanaka. 2018. Eusocial storage devices: Offloading data management to storage devices that can act collectively. ;login: Usenix Magazine 43, 2 (2018), 16\u201322.","journal-title":";login: Usenix Magazine"},{"key":"e_1_3_1_32_2","first-page":"830","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201916)","author":"Kumar Pradeep","year":"2016","unstructured":"Pradeep Kumar and H. Howie Huang. 2016. G-store: High-performance graph store for trillion-edge processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201916). IEEE, Los Alamitos, CA, 830\u2013841."},{"issue":"4","key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3364180","article-title":"GraphOne: A data store for real-time analytics on evolving graphs","volume":"15","author":"Kumar Pradeep","year":"2020","unstructured":"Pradeep Kumar and H. Howie Huang. 2020. GraphOne: A data store for real-time analytics on evolving graphs. ACM Transactions on Storage 15, 4 (2020), 1\u201340.","journal-title":"ACM Transactions on Storage"},{"issue":"12","key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"1706","DOI":"10.14778\/3137765.3137776","article-title":"ExtraV: boosting graph processing near storage with a coherent accelerator","volume":"10","author":"Lee Jinho","year":"2017","unstructured":"Jinho Lee, Heesu Kim, Sungjoo Yoo, Kiyoung Choi, H. Peter Hofstee, Gi-Joon Nam, Mark R. Nutter, and Damir Jamsek. 2017. ExtraV: boosting graph processing near storage with a coherent accelerator. Proceedings of the VLDB Endowment 10, 12 (2017), 1706\u20131717.","journal-title":"Proceedings of the VLDB Endowment"},{"issue":"2","key":"e_1_3_1_35_2","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1109\/LCA.2020.3009347","article-title":"SmartSSD: FPGA accelerated near-storage data analytics on SSD","volume":"19","author":"Lee Joo Hwan","year":"2020","unstructured":"Joo Hwan Lee, Hui Zhang, Veronica Lagrange, Praveen Krishnamoorthy, Xiaodong Zhao, and Yang Seok Ki. 2020. SmartSSD: FPGA accelerated near-storage data analytics on SSD. IEEE Computer Architecture Letters 19, 2 (2020), 110\u2013113.","journal-title":"IEEE Computer Architecture Letters"},{"doi-asserted-by":"crossref","unstructured":"Chao Li Yang Hu Longjun Liu Juncheng Gu Mingcong Song Xiaoyao Liang Jingling Yuan and Tao Li. 2015. Towards sustainable in-situ server systems in the big data era. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA\u201915) . ACM New York NY 14\u201326. DOI:10.1145\/2749469.2750381","key":"e_1_3_1_36_2","DOI":"10.1145\/2749469.2750381"},{"key":"e_1_3_1_37_2","first-page":"395","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC\u201919)","author":"Liang Shengwen","year":"2019","unstructured":"Shengwen Liang, Ying Wang, Youyou Lu, Zhe Yang, Huawei Li, and Xiaowei Li. 2019. Cognitive SSD: A deep learning engine for in-storage data retrieval. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC\u201919). 395\u2013410. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/liang."},{"key":"e_1_3_1_38_2","first-page":"27","volume-title":"Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW\u201915)","author":"Lin Kevin","year":"2015","unstructured":"Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW\u201915). 27\u201335. DOI:10.1109\/CVPRW.2015.7301269"},{"key":"e_1_3_1_39_2","first-page":"2064","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Liu Haomiao","year":"2016","unstructured":"Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2064\u20132072."},{"issue":"14","key":"e_1_3_1_40_2","doi-asserted-by":"crossref","first-page":"1844","DOI":"10.1002\/adma.201104104","article-title":"Real-time observation on dynamic growth\/dissolution of conductive filaments in oxide-electrolyte-based ReRAM","volume":"24","author":"Liu Qi","year":"2012","unstructured":"Qi Liu, Jun Sun, Hangbing Lv, Shibing Long, Kuibo Yin, Neng Wan, Yingtao Li, Litao Sun, and Ming Liu. 2012. Real-time observation on dynamic growth\/dissolution of conductive filaments in oxide-electrolyte-based ReRAM. Advanced Materials 24, 14 (2012), 1844\u20131849.","journal-title":"Advanced Materials"},{"doi-asserted-by":"publisher","key":"e_1_3_1_41_2","DOI":"10.1109\/JSSC.2013.2280296"},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1145\/3064176.3064191","volume-title":"Proceedings of the 12th European Conference on Computer Systems","author":"Maass Steffen","year":"2017","unstructured":"Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of the 12th European Conference on Computer Systems. 527\u2013543."},{"key":"e_1_3_1_43_2","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1145\/3352460.3358320","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Mailthody Vikram Sharma","year":"2019","unstructured":"Vikram Sharma Mailthody, Zaid Qureshi, Weixin Liang, Ziyan Feng, Simon Garcia De Gonzalo, Youjie Li, Hubertus Franke, Jinjun Xiong, Jian Huang, and Wen-Mei Hwu. 2019. DeepStore: In-storage acceleration for intelligent queries. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture. 224\u2013238."},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"4133","DOI":"10.1109\/ISCAS.2010.5537602","volume-title":"Proceedings of 2010 IEEE International Symposium on Circuits and Systems","author":"Manolakos Elias S.","year":"2010","unstructured":"Elias S. Manolakos and Ioannis Stamoulias. 2010. IP-cores design for the kNN classifier. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems. IEEE, Los Alamitos, CA, 4133\u20134136."},{"key":"e_1_3_1_45_2","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1109\/MICRO.2018.00060","volume-title":"Proceedings of the 2018 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201918)","author":"Mao Haiyu","year":"2018","unstructured":"Haiyu Mao, Mingcong Song, Tao Li, Yuting Dai, and Jiwu Shu. 2018. LerGAN: A zero-free, low data movement and PIM-based GAN architecture. In Proceedings of the 2018 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201918). IEEE, Los Alamitos, CA, 669\u2013681."},{"key":"e_1_3_1_46_2","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1145\/3307650.3322275","volume-title":"Proceedings of the 46th International Symposium on Computer Architecture (ISCA\u201919).","author":"Matam Kiran Kumar","year":"2019","unstructured":"Kiran Kumar Matam, Gunjae Koo, Haipeng Zha, Hung-Wei Tseng, and Murali Annavaram. 2019. GraphSSD: Graph semantics aware SSD. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA\u201919).ACM, New York, NY, 116\u2013128. DOI:10.1145\/3307650.3322275"},{"unstructured":"Micron. 2017. Micron NAND Flash by Technology. Retrieved February 27 2023 from https:\/\/www.micron.com\/products\/nand-flash.","key":"e_1_3_1_47_2"},{"key":"e_1_3_1_48_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/2541940.2541965","volume-title":"Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201914)","author":"Novakovic Stanko","year":"2014","unstructured":"Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2014. Scale-out NUMA. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201914). ACM, New York, NY, 3\u201318. DOI:10.1145\/2541940.2541965"},{"key":"e_1_3_1_49_2","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1109\/ISLPED.2013.6629310","volume-title":"Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED\u201913)","author":"Ouyang Jian","year":"2013","unstructured":"Jian Ouyang, Shiding Lin, Zhenyu Hou, Peng Wang, Yong Wang, and Guangyu Sun. 2013. Active SSD design for energy-efficiency improvement of web-scale data analysis. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED\u201913). IEEE, Los Alamitos, CA, 286\u2013291."},{"key":"e_1_3_1_50_2","article-title":"In-storage computing for Hadoop MapReduce framework: Challenges and possibilities","author":"Park Dongchul","year":"2016","unstructured":"Dongchul Park, Jianguo Wang, and Yang-Suk Kee. 2016. In-storage computing for Hadoop MapReduce framework: Challenges and possibilities. IEEE Transactions on Computers. Early access, July 28, 2016.","journal-title":"IEEE Transactions on Computers."},{"key":"e_1_3_1_51_2","first-page":"446","volume-title":"Proceedings of the 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920)","author":"Reddi Vijay Janapa","year":"2020","unstructured":"Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, et\u00a0al. 2020. MLPerf inference benchmark. In Proceedings of the 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920). IEEE, Los Alamitos, CA, 446\u2013459."},{"key":"e_1_3_1_52_2","volume-title":"Active Disks: Remote Execution for Network-Attached Storage","author":"Riedel Erik","year":"1999","unstructured":"Erik Riedel. 1999. Active Disks: Remote Execution for Network-Attached Storage. Carnegie Mellon University."},{"doi-asserted-by":"publisher","key":"e_1_3_1_53_2","DOI":"10.1109\/2.928624"},{"key":"e_1_3_1_54_2","first-page":"62","volume-title":"Proceedings of the 24th Conference on Very Large Databases","author":"Riedel Erik","year":"1998","unstructured":"Erik Riedel, Garth Gibson, and Christos Faloutsos. 1998. Active storage for large-scale data mining and multimedia applications. In Proceedings of the 24th Conference on Very Large Databases. 62\u201373."},{"key":"e_1_3_1_55_2","first-page":"379","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC\u201919)","author":"Ruan Zhenyuan","year":"2019","unstructured":"Zhenyuan Ruan, Tong He, and Jason Cong. 2019. INSIDER: Designing in-storage computing system for emerging high-performance drive. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC\u201919). 379\u2013394."},{"key":"e_1_3_1_56_2","first-page":"67","volume-title":"Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201914)","author":"Seshadri Sudharsan","year":"2014","unstructured":"Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A user-programmable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201914). 67\u201380."},{"key":"e_1_3_1_57_2","first-page":"14","volume-title":"Proceedings of the 43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201916).","author":"Shafiee Ali","year":"2016","unstructured":"Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the 43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201916). IEEE, Los Alamitos, CA, 14\u201326. DOI:10.1109\/ISCA.2016.12"},{"unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2020. Megatron-LM: Training multi-billion parameter language models using model parallelism. arxiv:1909.08053 [cs.CL] (2020).","key":"e_1_3_1_58_2"},{"key":"e_1_3_1_59_2","first-page":"258","volume-title":"Proceedings of the 2014 International Conference on Cloud and Autonomic Computing","author":"Son Yongseok","year":"2014","unstructured":"Yongseok Son, Nae Young Song, Hyuck Han, Hyeonsang Eom, and Heon Young Yeom. 2014. A user-level file system for fast storage devices. In Proceedings of the 2014 International Conference on Cloud and Autonomic Computing. IEEE, Los Alamitos, CA, 258\u2013264."},{"key":"e_1_3_1_60_2","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1109\/HPCA.2017.55","volume-title":"Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917)","author":"Song Linghao","year":"2017","unstructured":"Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917). IEEE, Los Alamitos, CA, 541\u2013552. DOI:10.1109\/HPCA.2017.55"},{"key":"e_1_3_1_61_2","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1109\/HPCA.2018.00052","volume-title":"Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Song Linghao","year":"2018","unstructured":"Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating graph processing using ReRAM. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918). IEEE, Los Alamitos, CA, 531\u2013543."},{"key":"e_1_3_1_62_2","article-title":"BERT rediscovers the classical NLP pipeline","author":"Tenney Ian","year":"2019","unstructured":"Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019).","journal-title":"arXiv preprint arXiv:1905.05950"},{"key":"e_1_3_1_63_2","first-page":"119","volume-title":"Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST\u201913)","author":"Tiwari Devesh","year":"2013","unstructured":"Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST\u201913). 119\u2013132. https:\/\/www.usenix.org\/conference\/fast13\/technical-sessions\/presentation\/tiwari."},{"issue":"3","key":"e_1_3_1_64_2","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1145\/3007787.3001143","article-title":"Morpheus: Creating application objects efficiently for heterogeneous computing","volume":"44","author":"Tseng Hung-Wei","year":"2016","unstructured":"Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating application objects efficiently for heterogeneous computing. ACM SIGARCH Computer Architecture News 44, 3 (2016), 53\u201365.","journal-title":"ACM SIGARCH Computer Architecture News"},{"issue":"4","key":"e_1_3_1_65_2","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmat4856","article-title":"A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing","volume":"16","author":"Burgt Yoeri van de","year":"2017","unstructured":"Yoeri van de Burgt, Ewout Lubberman, Elliot J. Fuller, Scott T. Keene, Gr\u00e9gorio C. Faria, Sapan Agarwal, Matthew J. Marinella, A. Alec Talin, and Alberto Salleo. 2017. A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing. Nature Materials 16, 4 (2017), 414\u2013418.","journal-title":"Nature Materials"},{"key":"e_1_3_1_66_2","first-page":"1","volume-title":"Proceedings of the 16th International Workshop on Data Management on New Hardware","author":"Vincon Tobias","year":"2020","unstructured":"Tobias Vincon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, and Andreas Koch. 2020. nKV: Near-data processing with KV-stores on native computational storage. In Proceedings of the 16th International Workshop on Data Management on New Hardware. 1\u201311."},{"key":"e_1_3_1_67_2","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1145\/3219819.3219869","volume-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201918)","author":"Wang Jizhe","year":"2018","unstructured":"Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation in Alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201918). ACM, New York, NY, 839\u2013848. DOI:10.1145\/3219819.3219869"},{"key":"e_1_3_1_68_2","volume-title":"Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN\u201916)","author":"Wang Jianguo","year":"2016","unstructured":"Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD in-storage computing for list intersection. In Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN\u201916). ACM, New York, NY, Article 4, 7 pages. DOI:10.1145\/2933349.2933353"},{"key":"e_1_3_1_69_2","first-page":"1","volume-title":"Proceedings of the 2008 IEEE International Electron Devices Meeting","author":"Wei Zhiqiang","year":"2008","unstructured":"Zhiqiang Wei, Y. Kanzawa, K. Arita, Y. Katoh, K. Kawai, S. Muraoka, S. Mitani, et\u00a0al. 2008. Highly reliable TaOx ReRAM and direct evidence of redox reaction mechanism. In Proceedings of the 2008 IEEE International Electron Devices Meeting. IEEE, Los Alamitos, CA, 1\u20134."},{"doi-asserted-by":"publisher","key":"e_1_3_1_70_2","DOI":"10.14778\/2732967.2732972"},{"key":"e_1_3_1_71_2","doi-asserted-by":"crossref","first-page":"1073","DOI":"10.1145\/2463676.2463685","volume-title":"Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201913)","author":"Woods Louis","year":"2013","unstructured":"Louis Woods, Jens Teubner, and Gustavo Alonso. 2013. Less watts, more performance: An intelligent storage engine for data appliances. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201913). ACM, New York, NY, 1073\u20131076. DOI:10.1145\/2463676.2463685"},{"key":"e_1_3_1_72_2","first-page":"512","volume-title":"Proceedings of the Scandinavian Conference on Image Analysis","author":"Yeh Yao-Jung","year":"2007","unstructured":"Yao-Jung Yeh, Hui-Ya Li, Wen-Jyi Hwang, and Chiung-Yao Fang. 2007. FPGA implementation of kNN classifier based on wavelet transform and partial distance search. In Proceedings of the Scandinavian Conference on Image Analysis. 512\u2013521."},{"key":"e_1_3_1_73_2","first-page":"45","volume-title":"Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST\u201915)","author":"Zheng Da","year":"2015","unstructured":"Da Zheng, Disa Mhembere, Randal C. Burns, Joshua T. Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST\u201915). 45\u201358. https:\/\/www.usenix.org\/conference\/fast15\/technical-sessions\/presentation\/zheng."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3563456","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3563456","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:36Z","timestamp":1750182576000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3563456"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,9]]},"references-count":72,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,11,30]]}},"alternative-id":["10.1145\/3563456"],"URL":"https:\/\/doi.org\/10.1145\/3563456","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2023,11,9]]},"assertion":[{"value":"2021-12-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-28","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}