{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,13]],"date-time":"2025-09-13T15:49:25Z","timestamp":1757778565443,"version":"3.41.0"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,10,26]],"date-time":"2023-10-26T00:00:00Z","timestamp":1698278400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2022YFB4500303"],"award-info":[{"award-number":["2022YFB4500303"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62072198, 61825202, and 61929103"],"award-info":[{"award-number":["62072198, 61825202, and 61929103"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>\n            <jats:italic>Computing-in-Memory<\/jats:italic>\n            (CIM) architectures using\n            <jats:italic>Non-volatile Memories<\/jats:italic>\n            (NVMs) have emerged as a promising way to address the \u201cmemory wall\u201d problem in traditional Von Neumann architectures. CIM accelerators can perform arithmetic or Boolean logic operations in NVMs by fully exploiting their high parallelism for bit-wise operations. These accelerators are often used in cooperation with general-purpose processors to speed up a wide variety of artificial neural network applications. In such a heterogeneous computing architecture, the legacy software should be redesigned and re-engineered to utilize new CIM accelerators. In this article, we propose a compilation tool to automatically migrate legacy programs to such heterogeneous architectures based on the\n            <jats:italic>low-level virtual machine<\/jats:italic>\n            (LLVM) compiler infrastructure. To accelerate some computations such as vector-matrix multiplication in CIM accelerators, we identify several typical computing patterns from LLVM\n            <jats:italic>intermediate representations<\/jats:italic>\n            , which are oblivious to high-level programming paradigms. Our compilation tool can modify accelerable LLVM IRs to offload them to CIM accelerators automatically, without re-engineering legacy software. Experimental results show that our compilation tool can translate many legacy programs to CIM-supported binary executables effectively, and improve application performance and energy efficiency by up to 51\u00d7 and 309\u00d7, respectively, compared with general-purpose x86 processors.\n          <\/jats:p>","DOI":"10.1145\/3617686","type":"journal-article","created":{"date-parts":[[2023,9,5]],"date-time":"2023-09-05T12:17:28Z","timestamp":1693916248000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["A Compilation Tool for Computation Offloading in ReRAM-based CIM Architectures"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3934-7605","authenticated-orcid":false,"given":"Hai","family":"Jin","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-3159-2614","authenticated-orcid":false,"given":"Bo","family":"Lei","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4290-1408","authenticated-orcid":false,"given":"Haikun","family":"Liu","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6302-813X","authenticated-orcid":false,"given":"Xiaofei","family":"Liao","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3950-3209","authenticated-orcid":false,"given":"Zhuohui","family":"Duan","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3432-855X","authenticated-orcid":false,"given":"Chencheng","family":"Ye","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2052-2231","authenticated-orcid":false,"given":"Yu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, China"}]}],"member":"320","published-online":{"date-parts":[[2023,10,26]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Dagger. 2018. A Binary Translator to LLVM IR. Retrieved from https:\/\/github.com\/repzret\/dagger"},{"key":"e_1_3_1_3_2","unstructured":"Anvill. 2022. A Reverse Engineering Tool. Retrieved from https:\/\/github.com\/lifting-bits\/anvill"},{"key":"e_1_3_1_4_2","unstructured":"McSema. 2022. A Framework for Lifting Binaries to LLVM Bitcode. Retrieved from https:\/\/github.com\/lifting-bits\/mcsema"},{"key":"e_1_3_1_5_2","unstructured":"Reopt. 2022. A Tool for Analyzing X86-64 Binaries. Retrieved from https:\/\/github.com\/GaloisInc\/reopt"},{"key":"e_1_3_1_6_2","unstructured":"RetDec. 2022. A Retargetable Machine-code Decompiler Based on LLVM. Retrieved from https:\/\/github.com\/avast\/retdec"},{"key":"e_1_3_1_7_2","unstructured":"STAR. 2023. The STAR Experiment. Retrieved from https:\/\/www.star.bnl.gov\/"},{"key":"e_1_3_1_8_2","first-page":"564","volume-title":"Proceedings of Design, Automation, and Test in Europe Conference and Exhibition","author":"Ahmed Hameeza","year":"2019","unstructured":"Hameeza Ahmed, Paulo C. Santos, Jo\u00e3o P. C. Lima, Rafael F. Moura, Marco A. Z. Alves, Ant\u00f4nio C. S. Beck, and Luigi Carro. 2019. A compiler for automatic selection of suitable processing-in-memory instructions. In Proceedings of Design, Automation, and Test in Europe Conference and Exhibition. 564\u2013569."},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"Proceedings of the IEEE International Conference on Rebooting Computing","author":"Ambrosi Joao","year":"2018","unstructured":"Joao Ambrosi, Aayush Ankit, Rodrigo Antunes, Sai Rahul Chalamalasetti, Soumitra Chatterjee, Izzat El Hajj, Guilherme Fachini, Paolo Faraboschi, Martin Foltin, Sitao Huang, Wen-Mei Hwu, Gustavo Knuppe, Sunil Vishwanathpur Lakshminarasimha, Dejan Milojicic, Mohan Parthasarathy, Filipe Ribeiro, Lucas Rosa, Kaushik Roy, Plinio Silveira, and John Paul Strachan. 2018. Hardware-software co-design for an analog-digital accelerator for machine learning. In Proceedings of the IEEE International Conference on Rebooting Computing. 1\u201313."},{"issue":"8","key":"e_1_3_1_10_2","doi-asserted-by":"crossref","first-page":"1128","DOI":"10.1109\/TC.2020.2998456","article-title":"PANTHER: A programmable architecture for neural network training harnessing energy-efficient ReRAM","volume":"69","author":"Ankit Aayush","year":"2020","unstructured":"Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic, Wen-Mei Hwu, and Kaushik Roy. 2020. PANTHER: A programmable architecture for neural network training harnessing energy-efficient ReRAM. IEEE Trans. Comput. 69, 8 (2020), 1128\u20131142.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_1_11_2","first-page":"715","volume-title":"Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Ankit Aayush","year":"2019","unstructured":"Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, and Dejan S. Milojicic. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 715\u2013731."},{"key":"e_1_3_1_12_2","first-page":"1","volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912)","author":"Beamer Scott","year":"2012","unstructured":"Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912). 1\u201310."},{"key":"e_1_3_1_13_2","first-page":"782","volume-title":"Proceedings of Design, Automation, and Test in Europe Conference and Exhibition","author":"Bhattacharjee Debjyoti","year":"2017","unstructured":"Debjyoti Bhattacharjee, Rajeswari Devadoss, and Anupam Chattopadhyay. 2017. ReVAMP: ReRAM based VLIW architecture for in-memory computing. In Proceedings of Design, Automation, and Test in Europe Conference and Exhibition. 782\u2013787."},{"key":"e_1_3_1_14_2","first-page":"1","volume-title":"Proceedings of the IEEE International Conference on Rebooting Computing","author":"Chakraborty Dwaipayan","year":"2017","unstructured":"Dwaipayan Chakraborty, Sunny Raj, Julio Cesar Gutierrez, Troyle Thomas, and Sumit Kumar Jha. 2017. In-memory execution of compute kernels using flow-based memristive crossbar computing. In Proceedings of the IEEE International Conference on Rebooting Computing. 1\u20136."},{"issue":"6","key":"e_1_3_1_15_2","first-page":"1","article-title":"Energy-efficient computing-in-memory architecture for AI processor: Device, circuit, architecture perspective","volume":"64","author":"Chang Liang","year":"2021","unstructured":"Liang Chang, Chenglong Li, Zhaomin Zhang, Jianbiao Xiao, Qingsong Liu, Zhen Zhu, Weihang Li, Zixuan Zhu, Siqi Yang, and Jun Zhou. 2021. Energy-efficient computing-in-memory architecture for AI processor: Device, circuit, architecture perspective. Sci. China Info. Sci. 64, 6 (2021), 1\u201315.","journal-title":"Sci. China Info. Sci."},{"key":"e_1_3_1_16_2","first-page":"246","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium","author":"Chen Dan","year":"2022","unstructured":"Dan Chen, Hai Jin, Long Zheng, Yu Huang, Pengcheng Yao, Chuangyi Gui, Qinggang Wang, Haifeng Liu, Haiheng He, Xiaofei Liao, and Ran Zheng. 2022. A general offloading approach for near-DRAM processing-in-memory architectures. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 246\u2013257."},{"issue":"12","key":"e_1_3_1_17_2","doi-asserted-by":"crossref","first-page":"3067","DOI":"10.1109\/TCAD.2018.2789723","article-title":"NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning","volume":"37","author":"Chen Pai-Yu","year":"2018","unstructured":"Pai-Yu Chen, Xiaochen Peng, and Shimeng Yu. 2018. NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 12 (2018), 3067\u20133080.","journal-title":"IEEE Trans. Comput.-Aided Design Integr. Circ. Syst."},{"key":"e_1_3_1_18_2","first-page":"27","volume-title":"Proceedings of the 43rd Annual International Symposium on Computer Architecture","author":"Chi Ping","year":"2016","unstructured":"Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proceedings of the 43rd Annual International Symposium on Computer Architecture. 27\u201339."},{"key":"e_1_3_1_19_2","unstructured":"Intel Corporation. 2009. Intel 64 and IA-32 architectures optimization reference manual. Retrieved from https:\/\/cdrdv2.intel.com\/v1\/dl\/getContent\/671488?fileName=248966-046A-software-optimization-manual.pdf"},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1007\/s10766-015-0396-z","article-title":"Using machine learning techniques to detect parallel patterns of multi-threaded applications","volume":"44","author":"Deniz Etem","year":"2016","unstructured":"Etem Deniz and Alper Sen. 2016. Using machine learning techniques to detect parallel patterns of multi-threaded applications. Int. J. Parallel Program. 44 (2016), 867\u2013900.","journal-title":"Int. J. Parallel Program."},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1145\/3470496.3527431","volume-title":"Proceedings of the 49th Annual International Symposium on Computer Architecture","author":"Devic Alexandar","year":"2022","unstructured":"Alexandar Devic, Siddhartha Balakrishna Rai, Anand Sivasubramaniam, Ameen Akel, Sean Eilert, and Justin Eno. 2022. To PIM or not for emerging general purpose processing in DDR memory systems. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 231\u2013244."},{"key":"e_1_3_1_22_2","first-page":"1","volume-title":"Proceedings of the 10th International Workshop on Polyhedral Compilation Techniques","author":"Drebes Andi","year":"2020","unstructured":"Andi Drebes, Lorenzo Chelini, Oleksandr Zinenko, Albert Cohen, Henk Corporaal, Tobias Grosser, Kanishkan Vadivel, and Nicolas Vasilache. 2020. TC-CIM: Empowering tensor comprehensions for computing-in-memory. In Proceedings of the 10th International Workshop on Polyhedral Compilation Techniques. 1\u201312."},{"issue":"2","key":"e_1_3_1_23_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3296957.3173171","article-title":"In-memory data parallel processor","volume":"53","author":"Fujiki Daichi","year":"2018","unstructured":"Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-memory data parallel processor. SIGPLAN Not. 53, 2 (Mar. 2018), 1\u201314.","journal-title":"SIGPLAN Not."},{"key":"e_1_3_1_24_2","first-page":"601","volume-title":"Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering","author":"Gui Yi","year":"2022","unstructured":"Yi Gui, Yao Wan, Hongyu Zhang, Huifang Huang, Yulei Sui, Guandong Xu, Zhiyuan Shao, and Hai Jin. 2022. Cross-language binary-source code matching with intermediate representations. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering. 601\u2013612."},{"key":"e_1_3_1_25_2","first-page":"486","volume-title":"Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition","author":"Hamdioui Said","year":"2019","unstructured":"Said Hamdioui, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Abu Sebastian, Manuel Le Gallo, Sandeep Pande, Siebren Schaafsma, Francky Catthoor, Shidhartha Das, Fernando G. Redondo, G. Karunaratne, Abbas Rahimi, and Luca Benini. 2019. Applications of computation-in-memory architectures based on memristive devices. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition. 486\u2013491."},{"key":"e_1_3_1_26_2","first-page":"1718","volume-title":"Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition","author":"Hamdioui Said","year":"2015","unstructured":"Said Hamdioui, Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Koen Bertels, Henk Corporaal, Hailong Jiao, Francky Catthoor, Dirk Wouters, Linn Eike, and Jan van Lunteren. 2015. Memristor based computation-in-memory architecture for data-intensive applications. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition. 1718\u20131725."},{"key":"e_1_3_1_27_2","first-page":"759","volume-title":"Proceedings of the International Conference on High Performance Computing Simulation","author":"Haron Adib","year":"2016","unstructured":"Adib Haron, Jintao Yu, Razvan Nane, Mottaqiallah Taouil, Said Hamdioui, and Koen Bertels. 2016. Parallel matrix multiplication on memristor-based computation-in-memory architecture. In Proceedings of the International Conference on High Performance Computing Simulation. 759\u2013766."},{"key":"e_1_3_1_28_2","first-page":"498","volume-title":"Proceedings of the 49th Annual Design Automation Conference","author":"Hu Miao","year":"2012","unstructured":"Miao Hu, Hai Li, Qing Wu, and Garrett S. Rose. 2012. Hardware realization of BSB recall function using memristor crossbar arrays. In Proceedings of the 49th Annual Design Automation Conference. 498\u2013503."},{"key":"e_1_3_1_29_2","first-page":"802","volume-title":"Proceedings of the ACM\/IEEE 46th Annual International Symposium on Computer Architecture","author":"Imani Mohsen","year":"2019","unstructured":"Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. FloatPIM: In-memory acceleration of deep neural network training with high precision. In Proceedings of the ACM\/IEEE 46th Annual International Symposium on Computer Architecture. 802\u2013815."},{"issue":"4","key":"e_1_3_1_30_2","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1109\/TCAD.2018.2819080","article-title":"NVQuery: Efficient query processing in nonvolatile memory","volume":"38","author":"Imani Mohsen","year":"2019","unstructured":"Mohsen Imani, Saransh Gupta, Sahil Sharma, and Tajana Simunic Rosing. 2019. NVQuery: Efficient query processing in nonvolatile memory. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 38, 4 (2019), 628\u2013639.","journal-title":"IEEE Trans. Comput.-Aided Design Integr. Circ. Syst."},{"key":"e_1_3_1_31_2","unstructured":"Intel. 2014. Intel Xeon Processor E5-2650 v3. Retrieved from https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/81705\/intel-xeon-processor-e5-2650-v3-25m-cache-2-30-ghz.html"},{"issue":"11","key":"e_1_3_1_32_2","first-page":"2872","article-title":"ReHy: A ReRAM-based digital\/analog hybrid PIM architecture for accelerating CNN training","volume":"33","author":"Jin Hai","year":"2021","unstructured":"Hai Jin, Cong Liu, Haikun Liu, Ruikun Luo, Jiahong Xu, Fubing Mao, and Xiaofei Liao. 2021. ReHy: A ReRAM-based digital\/analog hybrid PIM architecture for accelerating CNN training. IEEE Trans. Parallel Distrib. Syst. 33, 11 (2021), 2872\u20132884.","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"issue":"11","key":"e_1_3_1_33_2","first-page":"895","article-title":"MAGIC\u2013Memristor-Aided logic","volume":"61","author":"Kvatinsky Shahar","year":"2014","unstructured":"Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. MAGIC\u2013Memristor-Aided logic. IEEE Trans. Circ. Syst. II: Express Briefs 61, 11 (2014), 895\u2013899.","journal-title":"IEEE Trans. Circ. Syst. II: Express Briefs"},{"key":"e_1_3_1_34_2","first-page":"242","volume-title":"Proceedings of the International Symposium on Low Power Electronics and Design","author":"Li Boxun","year":"2013","unstructured":"Boxun Li, Yi Shan, Miao Hu, Yu Wang, Yiran Chen, and Huazhong Yang. 2013. Memristor-based approximated computation. In Proceedings of the International Symposium on Low Power Electronics and Design. 242\u2013247."},{"issue":"2","key":"e_1_3_1_35_2","first-page":"1","article-title":"ReCSA: A dedicated sort accelerator using ReRAM-based content addressable memory","volume":"17","author":"Li Huize","year":"2023","unstructured":"Huize Li, Hai Jin, Long Zheng, Yu Huang, and Xiaofei Liao. 2023. ReCSA: A dedicated sort accelerator using ReRAM-based content addressable memory. Front. Comput. Sci. 17, 2 (2023), 1\u201313.","journal-title":"Front. Comput. Sci."},{"key":"e_1_3_1_36_2","first-page":"1","volume-title":"Proceedings of the 53rd ACM\/IEEE Design Automation Conference","author":"Li Shuangchen","year":"2016","unstructured":"Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In Proceedings of the 53rd ACM\/IEEE Design Automation Conference. 1\u20136."},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1007\/s11265-014-0917-9","article-title":"Compilers for low power with design patterns on embedded multicore systems","volume":"80","author":"Lin Cheng-Yen","year":"2015","unstructured":"Cheng-Yen Lin, Chi-Bang Kuan, Wen-Li Shih, and Jenq Kuen Lee. 2015. Compilers for low power with design patterns on embedded multicore systems. J. Signal Process. Syst. 80 (2015), 277\u2013293.","journal-title":"J. Signal Process. Syst."},{"key":"e_1_3_1_38_2","first-page":"469","volume-title":"Proceedings of the 59th ACM\/IEEE Design Automation Conference","author":"Liu Cong","year":"2022","unstructured":"Cong Liu, Haikun Liu, Hai Jin, Xiaofei Liao, Yu Zhang, Zhuohui Duan, Jiahong Xu, and Huize Li. 2022. ReGNN: A ReRAM-based heterogeneous architecture for general graph neural networks. In Proceedings of the 59th ACM\/IEEE Design Automation Conference. 469\u2013474."},{"issue":"12","key":"e_1_3_1_39_2","doi-asserted-by":"crossref","first-page":"5476","DOI":"10.1109\/TCAD.2022.3152385","article-title":"A simulation framework for memristor-based heterogeneous computing architectures","volume":"41","author":"Liu Haikun","year":"2022","unstructured":"Haikun Liu, Jiahong Xu, Xiaofei Liao, Hai Jin, Yu Zhang, and Fubing Mao. 2022. A simulation framework for memristor-based heterogeneous computing architectures. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 41, 12 (2022), 5476\u20135488.","journal-title":"IEEE Trans. Comput.-Aided Design Integr. Circ. Syst."},{"key":"e_1_3_1_40_2","first-page":"1","volume-title":"Proceedings of the 52nd ACM\/EDAC\/IEEE Design Automation Conference","author":"Liu Xiaoxiao","year":"2015","unstructured":"Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, Hao Jiang, Mark Barnell, Qing Wu, and Jianhua Yang. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. In Proceedings of the 52nd ACM\/EDAC\/IEEE Design Automation Conference. 1\u20136."},{"key":"e_1_3_1_41_2","first-page":"1300","volume-title":"Proceedings of the 38th International Conference on Distributed Computing Systems","author":"Milojicic Dejan","year":"2018","unstructured":"Dejan Milojicic, Kirk Bresniker, Gary Campbell, Paolo Faraboschi, John Paul Strachan, and Stan Williams. 2018. Computing in-memory, revisited. In Proceedings of the 38th International Conference on Distributed Computing Systems. 1300\u20131309."},{"key":"e_1_3_1_42_2","first-page":"1","volume-title":"Proceedings of the International Workshop on OpenCL","author":"Nugteren Cedric","year":"2018","unstructured":"Cedric Nugteren. 2018. CLBlast: A tuned OpenCL BLAS library. In Proceedings of the International Workshop on OpenCL. 1\u201310."},{"key":"e_1_3_1_43_2","unstructured":"Louis-No\u00ebl Pouchet and Tomofumi Yuki. 2016. Polybench\/C. Retrieved from http:\/\/web.cs.ucla.edu\/pouchet\/software\/polybench\/"},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1145\/2485922.2485963","volume-title":"Proceedings of the 40th Annual International Symposium on Computer Architecture","author":"Sanchez Daniel","year":"2013","unstructured":"Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 475\u2013486."},{"key":"e_1_3_1_45_2","first-page":"14","volume-title":"Proceedings of the ACM\/IEEE 43rd Annual International Symposium on Computer Architecture","author":"Shafiee Ali","year":"2016","unstructured":"Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the ACM\/IEEE 43rd Annual International Symposium on Computer Architecture. 14\u201326."},{"key":"e_1_3_1_46_2","first-page":"531","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture","author":"Song Linghao","year":"2018","unstructured":"Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating graph processing using ReRAM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 531\u2013543."},{"key":"e_1_3_1_47_2","first-page":"1","volume-title":"Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium","author":"Sun Yuliang","year":"2017","unstructured":"Yuliang Sun, Yu Wang, and Huazhong Yang. 2017. Energy-efficient SQL query exploiting RRAM-based process-in-memory structure. In Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium. 1\u20136."},{"key":"e_1_3_1_48_2","first-page":"1602","volume-title":"Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition","author":"Vadivel Kanishkan","year":"2020","unstructured":"Kanishkan Vadivel, Lorenzo Chelini, Ali BanaGozar, Gagandeep Singh, Stefano Corda, Roel Jordans, and Henk Corporaal. 2020. TDO-CIM: Transparent detection and offloading for computation in-memory. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition. 1602\u20131605."},{"key":"e_1_3_1_49_2","first-page":"855","volume-title":"Proceedings of Design, Automation, and Test in Europe Conference and Exhibition","author":"Wei Yizhou","year":"2022","unstructured":"Yizhou Wei, Minxuan Zhou, Sihang Liu, Korakit Seemakhupt, Tajana Rosing, and Samira Khan. 2022. PIMProf: An automated program profiler for processing-in-memory offloading decisions. In Proceedings of Design, Automation, and Test in Europe Conference and Exhibition. 855\u2013860."},{"key":"e_1_3_1_50_2","first-page":"38","volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing","author":"Whaley R. Clint","year":"1998","unstructured":"R. Clint Whaley and Jack J. Dongarra. 1998. Automatically tuned linear algebra software. In Proceedings of the ACM\/IEEE Conference on Supercomputing. 38\u201338."},{"issue":"1","key":"e_1_3_1_51_2","first-page":"556","article-title":"FastBit: An efficient indexing technology for accelerating data-intensive science","volume":"16","author":"Wu Kesheng","year":"2005","unstructured":"Kesheng Wu. 2005. FastBit: An efficient indexing technology for accelerating data-intensive science. J. Phys.: Conf. Ser. 16, 1 (2005), 556\u2013560.","journal-title":"J. Phys.: Conf. Ser."},{"key":"e_1_3_1_52_2","first-page":"103","volume-title":"Proceedings of the IEEE Symposium on VLSI Technology","author":"Wu Wei","year":"2018","unstructured":"Wei Wu, Huaqiang Wu, Bin Gao, Peng Yao, Xiang Zhang, Xiaochen Peng, Shimeng Yu, and He Qian. 2018. A methodology to improve linearity of analog RRAM for neuromorphic computing. In Proceedings of the IEEE Symposium on VLSI Technology. 103\u2013104."},{"key":"e_1_3_1_53_2","unstructured":"Zhang Xianyi Wang Qian and Zaheer Chothia. 2012. OpenBLAS. Retrieved from http:\/\/xianyi.github.io\/OpenBLAS"},{"key":"e_1_3_1_54_2","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1145\/3316482.3326354","volume-title":"Proceedings of the 20th ACM SIGPLAN\/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","author":"Yadavalli S. Bharadwaj","year":"2019","unstructured":"S. Bharadwaj Yadavalli and Aaron Smith. 2019. Raising binaries to LLVM IR with MCTOLL (WIP paper). In Proceedings of the 20th ACM SIGPLAN\/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. 213\u2013218."},{"key":"e_1_3_1_55_2","first-page":"71","volume-title":"Proceedings of the Great Lakes Symposium on VLSI","author":"Yu Jintao","year":"2017","unstructured":"Jintao Yu, Tom Hogervorst, and Razvan Nane. 2017. A domain-specific language and compiler for computation-in-memory skeletons. In Proceedings of the Great Lakes Symposium on VLSI. 71\u201376."},{"key":"e_1_3_1_56_2","first-page":"696","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium","author":"Zheng Long","year":"2020","unstructured":"Long Zheng, Jieshan Zhao, Yu Huang, Qinggang Wang, Zhen Zeng, Jingling Xue, Xiaofei Liao, and Hai Jin. 2020. Spara: An energy-efficient ReRAM-based accelerator for sparse graph analytics applications. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 696\u2013707."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617686","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3617686","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:32Z","timestamp":1750178192000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617686"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,26]]},"references-count":55,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3617686"],"URL":"https:\/\/doi.org\/10.1145\/3617686","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2023,10,26]]},"assertion":[{"value":"2022-11-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-20","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}